TL;DR Codeium provides in-line fill-in-the-middle autocomplete suggestions, which no other AI code assistant does, including Github Copilot.
Code is precise and unforgiving - you add an argument to a function header or a field to a database schema? Odds are, you need to go through a bunch of call sites to keep the rest of the code in sync. The most common promise of AI-powered autocomplete is that developers can zoom through this mind-numbing repetitive work. But if you try to add an argument to a function to pass it through to a helper function with a newly modified signature? Well, Github Copilot wasn't of much use:
If you tried Amazon CodeWhisperer or Tabnine, you would have seen the same. Or rather, the lack of seeing any suggestions.
What’s going on? Turns out that these tools only give you suggestions when there is no meaningful text after the cursor on the current line (ignoring single closing braces or colons)! This is fine if you are writing new code, but is completely useless when trying to make any of these common edits to existing code. In fact, the more mind-numbing work for most developers are precisely these very edits!
So is this just a simple oversight by these tools? Well, not exactly. This scenario where there is text on both sides of the cursor (on the same line) and we want autocomplete suggestions is a special case of the fill-in-the-middle task called in-line fill-in-the-middle, or in-line FIM for short. And it is not easy to get in-line FIM to work well enough to be helpful rather than distracting.
We wrote about generic FIM in a previous post and we strongly recommend reading that first for full context on how to train an LLM with FIM awareness.
For in-line FIM, we have the same task - we mask out subsets of contiguous tokens within a single line so that the model learns to repair these shorter sections. On inference, we produce many suggestions on every keystroke, and if we get suggestions that don’t include newline tokens, we can identify them as in-line FIM suggestions. The reason for this last point is that we have noticed that it actually is more wrong than helpful to produce an in-line FIM suggestion that spans multiple lines since the suggestions more likely end up being hallucinations than actual short, useful repairs.
So what is likely going on with these AI code assistants? In the original post, we showed how TabNine didn’t have generic FIM capabilities in the first place. Amazon CodeWhisperer on face value does, but they must have made some significant fundamental errors in building FIM since the suggestions are quite poor, error-prone, and often not referencing post-cursor context.
But what about Github Copilot? They have reasonable FIM suggestions for filling in lines and using future context, so why is in-line FIM not enabled? Honestly, we are not sure. Copilot has normal FIM suggestions and it seems like there shouldn’t be anything that is stopping it from exposing in-line FIM suggestions. Our guess is that they tried to enable in-line FIM, but the performance wasn’t good enough, maybe from a lack of in-line FIM-like training examples or some other training knobs. All we know is that for now, Copilot has not enabled in-line FIM in production, and likely that means there is something up with the underlying model, which they are unwilling to retrain, or their FIM handling capabilities are inherently worse.
At Codeium, we are obsessed with generating high quality AI code suggestions wherever we can. So naturally, we made sure we built high-quality in-line FIM. What are some of the things you do with Codeium now?
As you can see, adding in new variables or arguments and incorporating them into existing code is a breeze. You can even completely delete arbitrary sections and Codeium will repair it for you:
There are probably even more places that in-line FIM ends up saving the day, and it is pretty fun to play around with. Think about all of the other times you need to add code in the middle of the line - adding type annotations, refactoring or translating code, changing APIs, etc. These will actually be accelerated by AI via Codeium, rather than other tools. Let us know where you find these magical moments!
In-line FIM is one of many tricky modeling problems that we have faced while developing Codeium, but as a company laser focused on generative AI for code, we are not distracted by other use cases or products and will keep pushing to solve these particular problems. In-line FIM is one recent example of a capability that only Codeium has, which signals that we are moving into the cutting edge of innovation, not just catching up to other products. We’re excited to implement more novel solutions or improvements that no one has productionized yet. We hope you are too.