Code Mode in io.Intelligence
io.Intelligence Engineering Lead Kalin Kostov introduces Code Mode – a new approach to agent execution that replaces large tool libraries with LLM-generated code, reducing token usage, improving reliability, and keeping sensitive data on-device.
Alright. Hi again. So there are some very interesting concepts that has been around called code mode, which I would like to share more about with you today in terms of what the concept is, why does it exist, and what's our take on it, and what can you expect from us. So first, tools are great. Tools make it so that the LLMs on their own, as you know very well, they return text. Text. That's it. They cannot do anything. They cannot work on Excel spreadsheets. They cannot work on applications. Nothing. They cannot even search the web. All of that is done through tool calling. That's how agents get formed, and agents perform work of which we are interested in. So tools are amazing, except that they suck. And the thing is that they suck because you want to have a very capable system. You want to have capable agents. And for that you give them a lot of tools. It goes with it. But having a lot of tools in your system, and that doesn't really matter if they're really small atomic tools or composite tools. Because as with everything in software, as soon as something's proved something proves to be useful, we try to push it to be more useful. So we want more capabilities. Great. You can open a spreadsheet. Now I want you to create a spreadsheet for me. Now I want you to interpret this complex spreadsheet for me. Now I want you to do a spreadsheet into a presentation, all of that stuff. So we introduce more and more tools. And soon, very soon what happens is the first result of this situation which is to over load. We provide with each request to the LLM all of the tools available. And that increases the context window and there are just so much to choose from. Another problem is sequence fragility. We're not interested anymore in simple tasks. We want to achieve complex tasks that actually drive some return of investment. And that often requires calling tools in specific sequence. Two one needs to execute, then two two, then three, then four, and the LLM needs to make the correct choice every step of the way. It needs to choose to call the correct tool at the correct time, provide the correct arguments, and then proceed proceed further correctly using the output of the tool. And you can see that any little glitch in this chain can break the situation. The last problem is the context float. Because this is something that Ivan talked about, one of the lessons learned. Just a tool returning all the clients from a database will very quickly overflow the context. But the context gets full, the context window gets full not only because of inefficient tool design, but it also gets quickly flooded by all the descriptions of the tools. Every tool comes with an input schema, output schema. And also when you have a complex system, in order for the LLM to have a fighting chance to be able to compile to to compose those tools into a flow, it needs to have more and more complicated system instructions or skills or whatever it is. And everything, skills, tools, servers, everything goes into the context window. And the bigger the context window is before you even send the first prompt, the more you set you set you up for failure. Because that introduces context throughout, the LLM tries to find the knee in the haystack, problems. So what are the current solutions today? The first one is going away from really small tools. Instead of having a tool using our tools as example that can just start an application like calling a one API method, we have a composite tools. Yeah, it can start one or more applications. It can come once you go to, it can compose them into workspace, then it's going to call and interop methods. Just more work to be done in a single tool. This reduces the overall amount of tools, but increases the amount of instructions that the LLM needs to be able to correctly use the composite tool. Yeah, this goes all the protein. But so in essence what you achieve with this is, yeah, you improve the situation, but you're really not solving the underlying issue. And just move the burden around. So here comes code mode. And this was first brought to the scene by Cloudflare, an article that they did. It was picked up by Anthropic which did an experiment on their own. And it turns out the airlines are good at writing code. So why don't we let them write code? And what they observed is vast increase in overall efficiency. So the idea is the following. The LLM can instead of having a lot of tools to choose to call, the LLM understands an API. And the LLM can write a code, a code snippet that executes all the tools that it needs in sequence, in parallel, iterates over them in a loop, conditionally does does something with the output of each tool, basically a script. So instead of going n amount of turns to the LLM, you go once to get the code and once to get the result of the code. And that's it. So the LLM can write the code. The program can execute the code. It will call the APIs which are the tools that will do the work. And then you get a specific answer to your query. That's the idea of the LLM. So internally it works generally speaking like that. We have just two tools available to your LLM. Not hundreds, just two. One of them describes the API. The other one executes the code. Now the API is effectively, once the MCP server connects, it understands all of the tools, let's say one hundred. But instead of publishing those kind of tools to the LLM, it just publishes those two, describing the API and executing the code. And then the LLM knows what kind of a snippet it can write. So it writes the snippet, ships it back to the agent, and the agent executes this code by calling the tools. It looks something like that. Now on a high level overview, it turns from this. We have many tool choices. This is the current mode. A lot of tools, and the payload of each tool is very low. And we have a failure point at every step. So in essence, the LLM needs to read the full list. It always has access to the full list. Pick the correct tool, return the payload, pick up a following tool, choose the parameters, filter the data, summarize. And it's a very simple flow. With code mode, this translates into having two tools with filtered shape and a really low failure point. So that happens like this. The LLM inspects the API map, generates a Python or whatever it is, language. Internally this gets executed, calls each of the APIs, does something with the results of each API call, loop, branch, filter, whatever it is, return the answer. That's it. So we looked at this approach and we found it extremely interesting because especially in our connect world, we have tools, but we also have had methods for a much longer time than that. And we have in a lot of our clients hundreds of applications, many hundreds of applications, many hundreds of methods, intents, FDC three intents, and all of that stuff. A lot of capability. What if we can communicate this capability so that the LLM can write a snippet to have access to it in a much richer way? So what we went about is we leveraged a small controlled Python sandbox, which we're able to run on the desktop and on the browser. It's a very restricted sandbox. It's able to execute Python code without having access to any Python libraries. It cannot download Python libraries. It cannot execute even native Python libraries. It can only do the basic Python functions. But it has access to a very well specified outside functions, which are the tools that we have. This, like I said, works in the desktop and in the browser. So we're able in the desktop we have the Python runtime work as a desktop process. In the browser we compiled it into web assembly modules and we have a loaders. So that just like you've used to with our connect browser having get everything run nicely and smoothly without installation on the machine, so is valid for the controlled Python sandbox. It doesn't require any extra runtime. The sandbox solutions today require either spinning up local Docker containers, which takes more resources, takes time, increases increases setup complexity, or communicating to cloud based sandboxes, which costs more money, takes up more time, you're dependent on bandwidth. And of course, our solution is fully io.Connect aware and plugs in nicely into the system. This is the high level overview, but I want to show you in practice. So just like before, I'll continue living on the edge because it seems to be working for me. So I'll show you a sneak peek of our code mode. It's very much in progress. So it's kind of a teaser at this point, but I hope you get the gist of it. So we have our connect desktop running here. I'll start up the rest of my services. And this one. Alrighty. So first off, I'm gonna open up the React version of io.Assist. Is it the correct one? Yes, it's the correct one. So I have nothing here to explain to you. You've seen this a few times already. This is io.Assist, the React version is identical to the Angular version. But I am using this one because it is connected to our connected to our MCP server, which is integrated into the desktop as a node process, and it works just like expected. It exposes the system tools like we know, and there is an addition to get clients which is defined by a service application running in our connect desktop. This function is designed to be called and return all the clients that I have access to. Very bad idea, but it works for our situation. So what I want to do is I know the name of one of my clients. It's called Amelia Reid. But I want to find her adviser. Can you please help me find out who the adviser for Amelia Reid is? So let's fire this up and see how it goes. Very happy to do it. Great. So we have the adviser for Amelia Reid m Carter. Well, that's true, but that's not what we're interested in. What we're interested in is what happened with the tool. Get clients get called, and I'm interested in the response. As you can see, the response here is we dumped all the clients that we know. Well, we don't have many clients, but that's understandable in this situation. You can imagine how this can be in the many megabytes like Ivan said. But for our demos here, this will work just fine. So we're turning all of the clients because this is what this tool does, turns all the clients. So then we get this response. So we get one turn to the LLM with the initial prompt. The LLM comes back with a response saying, okay, in order for me to get the job done, want you to call the get clients. Call get clients. This method, this tool, which is underlying an interop method, returns all of the clients, then this collection of data together with the previous messages, together with the system prompts and the tools gets back to the LLM saying, this is the response from the tool call. And then the LLM reasons and gets up mCarter from this list. Now, an interesting thing I should point out is the LLM didn't iterate over the list in a loop like we would do. It does reasoning in the neural network. That's one of the reasons why the famous example how many hours in a strawberry fails, because it's not coding it. It's using LLM magic. So that's what happened. But we got the result, but that can easily lead to problems like Ivan said. So I want to try the same thing using, I'm going to open up this application because it's going to serve my client lists here. And I'm going to open the io.Assist angular. I'm using this angular io.Assist because it's configured to connect to a different MCP server, which has different tools. It doesn't have our system tools. It doesn't have get clients even. It has list Python definitions, get Python definitions, and execute Python code. This is the tools that we've defined currently in our sandbox, which is designed to be used within our MCP SDK or within AI web. The idea of those tools is that first the list returns all of the tools available in the system. In the system in terms of names and descriptions, omits the schemas, which tend to be the most heavy part. So the LLM, once it gets the first prompt, can understand what API it's working with. So it can understand which tools it's going to need. Then the LLM is going to co get definitions to understand of the tools that they're going to need, what are the schemas, so that then it can call execute Python code in order to generate the Python code for us to execute. And that's the three steps the LLM needs to follow. Always. They don't increase with an amount of tools. They're constant. Always three steps, which is very easy to manage with the system prompt. So let's see if it's going to work. The same prompt. So it lists the definitions, found that there is a get clients function. Now it gets the full schema and it's gonna try to run the code. The first time it failed because it didn't return the result. Like I said, this is an early teaser, we're fine tuning the system. And the second time, it found the information. It's m Carter, correctly identified. We can see here in the lock of this, that's why I opened it, so you can see how it was fetching the clients from this application. It fetched it twice because it run the code twice because the first time it didn't get the correct result. It didn't get any result. So let's see what has happened actually. First it called get a list Python definitions, which will list all the that we have. User greetings, get clients, all of the other stuff. Basically, those are all the tools that would have otherwise been directly broadcasted to the LLM. This time we just package them as API without the schemas. Then the LLM decided that it's going to need the get clients. So we give it more information about the get clients function. It doesn't require parameters. It's going to return a list of blah blah blah, and it's going to return it in this format. So now the LLM has everything that it needs. It now understands exactly those parts of the tools that it needs and nothing more. So it executes the final tool which is execute Python code. Now this code failed because like I said, we're fine tuning it at this point. It didn't return the value. So it's decided to try again. And the LLM wrote this code. So let me grab this code real quick because it even for me, it's not really readable. And I'm gonna use a very programmatic thing. Format this Python for me. So in essence, this is the code that we got. So it's a it's a snippet. It called get clients, saved the results, defined an output variable. If if the the the call to the API was success, get the clients, define the the variable here, iterate over the clients, find Amelia on first name, last name. If a milieu was found, compile the output, and get the adviser property. This is how we would do it. And this is how LLMs are much better to do it compared to the other approach that I showed previously. And this is why it was able to understand to get Amelia from the list and return the correct results. Yes. The LLM created the string of codes which sends to us. That's why we have execute by the code. And we execute the codes using the Python interpreter that I talked about. In this case, the Python interpreter is in the desktop as a node process. If I run this in the browser, it's going to be in the browser platform as a web assembly module. So we run the code, we call the tool, the tool returns the results, does the iteration, and then what we return back to the LLM is just this bit. The LLM never sees the whole list of clients, never sees the logic. The LLM doesn't do the magic computation anywhere. The computation happens on device. And the LLM sees only this. What does it mean? It means two things. Number one, if the flow is more complicated than this simple hello world example, chaining two calls, conditionally executing them, doing stuff with their responses in terms of filtration, in terms of next steps and so on, branching out becomes extremely for the LLMs. Because like we saw from the Norman and Sons guys, LLMs are very capable of writing code. So they're more than capable of writing even somewhat semi complex scripts of calling APIs, let alone whole applications. So number one, will increase vastly the success rate of chaining multiple tools to execute complex flow. And number two, like I showed with this example, the list of client, that data never leaves the machine, never goes to the LLM. The LLM doesn't need to reason on top of it, which removes any chance of hallucinations. Because like I said in the previous example, those list of clients, they go to the LLM. The LLM needs to reason who the advisor is. And that can lead to hallucinations very easily. In this case, there are no hallucinations. It's a very well defined script which will get the advisor familiar read. And the final result of this is token usage. So let's look at the token usage. In the first example, we burned twenty seven thousand three hundred ninety three tokens using the traditional approach. In the second run, we basically use a third. So this is considerable savings, but it's even better than that. Because the difference here happens because we're not dumping a lot of system instructions and a lot of tool definitions into the context. We're saving them because we're working with three tools. The LLM is working with three tools. It's very easy to give specific instructions to LLM for those three tools. The rest is called logic that we execute. So savings, the savings here represent not dumping two definitions. But if that subset of clients, that collection of clients was vast like in one case megabytes, then we will see fifty, a hundred text reductions in tokens, because that collection never goes to any LLM. And we don't need to design tens of tools for filtration and so on and so forth, hoping that LLM will call them in chain. All of that happens with a single script. So this filtration and extracting the advisor happened with one tool, one real tool call. So that is the power of Codemote which we're very excited to bring to you very soon. And like I said, it will be working in browser and in desktop. It's not going to require anything from you in terms of special infrastructure. It's not going to require any Docker enablement on the system. It's not going to require cloud connections and remote sandboxes. Just like anything with our connect, MPM installed the package and called the factory function. So with that, I'll bring it back to Bob to bring it on.
As AI agents grow more capable, the standard tool-calling model starts to break down. More tools mean larger context windows, more fragile execution sequences, and more data flowing to the LLM than necessary. Code Mode is interop.io’s answer to that problem.
In this session from the London 2026 Developer Community AI Meet-Up, Kalin Kostov explains what Code Mode is, why it exists, and what to expect from io.Intelligence’s implementation – including a live demo.
What you’ll learn
- Why traditional tool-calling architectures hit a ceiling at scale: tool overload, sequence fragility, and context window bloat
- How Code Mode works: instead of choosing from hundreds of tools, the LLM receives just two – one to inspect the API, one to execute code – and writes a Python script that handles the rest
- Why LLM-generated code outperforms chained tool calls for complex, multi-step workflows
- How Code Mode keeps sensitive data on-device: the LLM never sees raw result sets, only the filtered output of code executed locally
- Token usage in practice: a side-by-side comparison showing roughly a 3x reduction versus the traditional approach – with even greater savings at scale
- How the controlled Python sandbox runs in both browser (via WebAssembly) and desktop, with no Docker, no remote sandbox, and no extra infrastructure required
Who this is for
Engineers and technical leads building or evaluating AI agent workflows on io.Connect – whether you’re designing a new system or looking to improve the reliability and efficiency of an existing one.
Explore the docs
