
Notes from the Field: Lessons from io.Intelligence Implementations
io.Intelligence Lead Consultant Ivan Pidov shares what actually happens when you take these tools into client engagements – including a $1,000 bill in two hours, and the patterns that keep coming up across every implementation.
Thanks, everyone. My name is Ivan Pidov. I'm Lead Consultant in the io.Intelligence business unit where I work with clients mostly in early stage engagements, scoping POCs and proof of values for applications and keeping expectations grounded because sometimes things can get messy. And I'm here to talk to you about some of the lessons that we learned during the past year when we were working with several clients using the tools that Kalin just demonstrated and how we AI and interop.io enabled a couple of applications and what were the good parts and what were the bad parts about it. So why does it matter? First, this is based on our real experience and what we found in the wild doing real implementations, not just something that we read about. It's lessons that were, as Kalin said, expensive and painful. And the second thing is that some of these patterns are repeatable between clients. It's not just a one off thing that we observed with one specific instance. But this applies to multiple projects. And it will probably apply to you if it's not something that you've already hit. And finally, there's a cool story about how we managed to burn through one thousand dollars in about two hours, but I'll get back to this in a bit. So first, I want to explain what we envisioned for one specific use case and how we wanted to build it. So what you have what we have is unified workspace, io.Connect workspace. On the left side is a business level application that users are currently using on a daily basis. It's brought in production. Users are very familiar with it, how they navigate it, how they want to extract data, and what to look for. Then on the right side, we wanted to put a chat agent where users can ask questions about the application or about some of some information about the data set, about financial transactions, about, let's say, deposit workflows, stuff like this. And there are a few reasons why we wanted to design it this way and why we wanted to build front-end implementation, not back-end implementation. We could easily just hook APIs and give access to the model to the back-end APIs, but that's not the optimal approach in our opinion. First, we are a desktop integration platform. That's what we understand. That's what we are good at. We felt it's natural to just leverage our experience there. Second option is this brings visibility to the user. Because as I said, if you just look an API and ask the model a question, it will go fetch the data from the API. It will think for a while. But the user will have no way to verify that this information is actually grounded in something that is true. There is no way to verify this information. And for users, it's not easy to just trust these results without the ability to verify it. Finally, as an added benefit, you get access control for free. Because since the model and the agent is running in the context of the user, you don't have to worry about separate authentication layer because you inherit the permissions of the user. And the model and the agent are only able to execute what the user has access to. So these are the main three reasons why we decided to go this way. And now I want to play a short demo that can actually show you how this looked in reality. I'm not as brave as Kalin, so I have a video. I'm not going to trust the Anthropic API to behave. And I'll try to narrate what's going on the screen. As I said, on the left side, we have the business application. And on the right side, we have the hat UI. So the whole flow starts with the prompt by the user. In this case, it's a balance sheet decrease of approximately nine million from today versus the most recent month-end for non-interest deposits. Identify the underlying deposit for the drivers of this change. You just send this, and then the model starts thinking. First, we gave the model tools that it can use. The tools were registered by the application on the left side. So initially, it called the knowledge lookup tool where it tries to understand what workflow it should execute based on the user prompt. And then the other tools give it the ability to navigate the page and extract data from the tables and from the charts that you can see. So it immediately navigates to the deposit movement page automatically. Then it filtered on non-interest deposits. And it will try to understand what's the reason and the driving factor of this change. So as you can see, it comes up with some proposition and the assessment of what is going on. And let me just pause it. In the end, it suggests the next steps that it can execute and help the user with his query. So first action it suggests is to drill into account level for checking businesses. Second is to cross check for internal product migration, review the five defining business accounts in Customer 360, which is a different report in the same application, or compare to prior January patterns and whatnot. So in this case, I just sent one, which was the first suggestion. And it will drill into account level report, which is another report it will navigate automatically, pass the context for the specific account it wants to investigate, and then from there go to the 360 overview and understand what's the driving change of this money movement. So as you can see, it is using the same tools that we defined in the application. Some of the tools give access to the data set. Some of the tools give ability to navigate different pages. And I'll just quickly skip over this because it's doing over and over the same thing for the same client. Here, it navigated to the 360 report. And in the end, let me just go here. In the end, when it gathers all of the information, the conclusion is that this movement is probably not critical. This is something normal for this specific account and there's nothing to worry about. So we'd suggest the next steps, which are analyze different accounts or just send proactive email to this user or to the account manager for this user, which it automatically generated and prepared on Outlook where you can just send it to the account manager. So this is what we did. And it worked. But there are of not a couple more than a couple of bitter lessons that we learned. And this is what I'm going to actually focus on for the next couple of slides. So the application on the left is powered by and built on top of Power BI embedded. And in order for the agent to be able to use this data, we had, as I said, we had to register this and expose the Power BI API as tools. This is what we did. And then whenever the user has question, the user calls the tools, analyzes the response of the tool. And it can either send back the answer to the user, or it can go into this tool calling loop over and over and over again until it has enough information. So this worked well with dummy data and with the mocked data. But there were issues once we started pushing real data into the implementation. Because the real data turns out that some of these tables that you saw, they had about thirty thousand rows, which was about seven megabytes, I think. That these seven megabytes we were sending as messages to the model over and over and over again with each request. So we did it because instinctively, I think the models are capable. They can work with this data. They can analyze it. And this is true because the results were impressive, if I can say so. That what we saw and the analytical work that the models were doing was correct, but this led to some issues with rate limiting and throttling. The only way to understand that your rate limited when you're using Anthropic APIs is just by observing the slowness on the screen. They don't tell you when you're rate limited. They don't tell you when you're throttled. The dashboards that they have are kind of useless. So it was bad user experience. But we, yeah, we hope we found out the hard way that you shouldn't send all of your data to the model. You should probably apply some filtering, some paging, and some aggregation logic on your side. And when I say on your side, this means the tools should do this. You shouldn't expect the agent to do this. They can do it. It's just suboptimal and expensive to do it this way. And the second the second lesson is it's pretty much the same principle, but just applied to the system prompt, not to the data set. Instinctively, you'd think that if you give the model and sorry, the agent all of your human knowledge that you have, and you cram all of the workflows and all of the business logic and all of the domain knowledge into the system prompt, then the agent should be able to figure out everything you throw at it because it already has access to this in the system prompt. But that's not actually the case because the more context you have and the more information there is in the system prompt, the easier it is for the models to get lost and to either over apply a rule that's not applicable to the specific task at hand or get confused and go on track and execute a different workflow. And the reason is the other reason is that you're sending this system prompt with every single user message. It doesn't matter whether the current user message is relevant at all. You're still sending the system prompt. You're still consuming tokens, and the agent still has to read through everything regardless of whether or not it's applicable. So the way to mitigate this and to work around this is to move the business logic and your domain knowledge to a separate layer. In our case, we used RAG vector database and gave the agent tools to query the database. So whenever a user asks a question about, let's say deposit movements, it can query the workflow for deposit movements and only apply this specific information to the query instead of sending everything every time that the user has a question. And kind of habitable pattern, underlying issue is the same, too much data. So in our case, it was too big of a system prompt, too much of a data set that the the models had to work with and an unfortunate bug where the models kept executing the same tool over and over again. This tool happened to returns, as I said, six megabytes of data. And we got to about one thousand dollars in about two hours. This was for a single application, for a single user. We got lucky. Not a token-maxed company. No. No. We are not. Not yet. One of the capital leader work out there. Yeah. But it wasn't a good thing. But it just made it made me famous. So, yeah, we got lucky, honestly, because we managed to catch this in time and to stop it. Just last week, I think there was a report about the company burning half a billion dollars in a month. And the fix is something that we applied, but they didn't. The and it's you should apply limits and usage caps on a user and on application level, not on a company level. Because we have company wide policy and but a thousand dollars is not going to hit our company wide policy. So you get these rogue applications, and company wide policy doesn't doesn't protect you from this. So we need to apply limits, as I said, on application and on the user level. And one way to do this is by implementing LLM gateways where they give you much better control and much better visibility over who's using what, who's using how much. And it allows you to limit usage instead of relying on the vendor dashboards and what the vendors give you out of the box. The next is when we talk about cost, this probably is counterintuitive and doesn't make sense. Traditionally, if you're building a POC or an application that you're still not sure how it will end up, whether or not it will be useful in production, the instinct is to not invest a lot of money into this because in the end, it could be throwaway code. And traditionally, this works. But in the AI world, this is kind of doesn't work because on one side, you're still building your application. You're building your system prompts, your tools, your user prompts, whatnot. On the other hand, you have a cheap model. When you combine these two and something doesn't work, you don't know what's the reason because you're trying to debug and to reason between, is it my implementation? Is it my code? Or is it the model it's just not capable enough? So what we did is we started with the most capable model at the time, which was, I believe, Opus 4.7. We got the implementation. We got the workflows. And we were happy with the implementation that we had. So we used this as the baseline and tried to optimize down to see what is the least capable model and the cheapest model that can still give us the same results. So in the end, we ended up using Sonnet 4.5, which brought the cost down to yeah, it was five times cheaper. We experimented with some small language models. The results were not satisfactory, but this is also something that you can use, especially if you have trivial tasks like summarization of an email or transcript of a meeting. You don't need to pay for Opus. You can run small language models on a server, and it will be a lot cheaper than just depending on the most expensive. This could be this is probably obvious, but I still want to mention it. We started by using the same sharing the same API key for all of the users. And for and when I say users, I mean developers and people involved in the POCs, which makes sense in the beginning because it's easier to manage. It's less administrative work. Just share the same key across the team. Unfortunately, when things start moving into production and when you start trying to debug or understand an issue, this this is not a good way to do it because you don't have visibility over which user is burning the tokens, which application is burning the tokens. And it becomes very hard to understand what's the underlying cause for a bug. So issue separate keys for every user and for every application because this way, it's a lot easier to track usage. And finally, this is this is my favorite part, and this is a good lesson. And it was learned in a positive way. Do a hackathon. Because we've been doing these implementations over the years. And usually, we do this asynchronously over emails. We start with the discovery meeting where people are excited. Everybody has some ideas. And when we when you go to the implementation, things start moving slowly because you're doing it asynchronously over emails. Some decisions take days to be resolved. And by the end, you probably spent two or three months implementing something. Half the people are no longer interested into this POC. Half of the people no longer remember why they're doing it. And you get a working product, but it's not the it doesn't have the wow factor. In this case, we did a three day hackathon. So we went into the hackathon only having one discovery meeting before this. We didn't have any idea or we didn't have a clear idea of what we are going to build in the end. But we gathered in the same room technical people, SMEs, the and the stakeholders. And by noon on the first day, we had very clear idea of what application we want to build and how we want to build it. And I was able to start building the application, deploy quickly to the team to test, and iterate very fast on the implementation. So by the by day three, we had a working prototype of the one that you just saw on screen. We started with application that is not Interop enabled to one that has that is Interop and AI enabled. And you saw how it behaved. And, yeah, then I just want to summarize the three things. Do a front end AI because this builds trust with users. It's easier. It's faster to do this, and you get access control for free. Shape the context. Don't rely on the model to understand and to filter through all of the data because this is first, slow, second, expensive. Guard costs per application apply per application caps and per user caps. Start with the best model. Don't cheap out on the model initially. I understand that it's expensive in the long term. But if you start with the cheapest model first, it will probably be more expensive than starting with the best models first. Issue separate access keys to separate users and move fast. Because currently, for the past year or so, the bottleneck is no longer writing the code for your application. The bottleneck is making the decisions and actually knowing and deciding what you want to build. And yeah. Thank you.
Theory is one thing. Production is another. In this session from the London 2026 Developer Community AI Meet-Up, Ivan Pidov – Lead Consultant in the io.Intelligence business unit – shares hard-won lessons from real client implementations of AI-enabled financial desktop applications.
The talk walks through a live demo of a deposit analysis workflow: a chat agent embedded alongside a business application that navigates reports automatically, extracts and cross-references data, and surfaces actionable next steps – including a drafted email to the account manager. Then comes the honest part: what broke, what it cost, and what to do differently.
What you’ll learn
- Why front-end AI integration beats direct API access – for user trust, data verifiability, and access control you inherit for free
- Why sending full data sets to the model is a mistake: the real-world story of burning $1,000 in two hours from a looping tool call returning 6MB of data per request
- How to manage context properly: filtering and aggregating data in your tools, and using RAG to serve domain knowledge selectively rather than cramming it all into the system prompt
- Why you should start with the most capable model, establish a working baseline, then optimise down – not the other way around
- How per-user and per-application token caps (not just company-wide limits) protect you from runaway costs
- Why a three-day hackathon outperformed months of async email-driven implementation – and how to structure one
Who this is for
Engineers, architects, and technical leads scoping or running AI agent implementations in capital markets – particularly anyone moving from POC to production for the first time.
Explore the docs
