Published on

Be Careful Where you Send Your Code

Written by
Anshul Ramachandran
A server rack in a nice office space.

Disclaimer: This post does mention Codeium’s Enterprise offering, but the majority of the content is a general take on code security with respect to third party dev tooling.

March 20. ChatGPT users are able to see the title and first message of other users.1 All those people copying in code snippets from their private repos to provide context for their questions? Yeah, this isn’t great.

Well, at least that was patched relatively quickly and was totally a one-off when it comes to security problems for dev tools, right?

March 23. Github.com’s RSA SSH private key was exposed in a public Github repo.2 Well shoot. While Github does tell users to not push secret keys or other sensitive information, incidents like this don’t instill confidence on code security. And the suggested fix to replace local fingerprints with the new one? That’s perfect feeding grounds for a man-in-the-middle attacker (if someone isn’t careful on verifying the value).

And it’s not like these incidents are rare and just coincidentally happened at the same time - the internet is full of examples of security incidents on a platform as mature and advanced as Github.3 So a serious question to ask yourself with any external dev tool is - what guarantees can I get with the security of my code, the most important IP that I have?

This post is not to bash OpenAI or Github - these things happen with all companies! Small companies don’t have all of the processes in place and larger companies have a lot more people that could introduce issues (high error rate _ low quantity vs low error rate _ high quantity). We just use OpenAI and Github as examples because of how much hype (and FOMO) currently surrounds ChatGPT and Copilot.

The irony is that as products improve and get more popular, the more attention they receive from hackers and bad actors. And while Github has had decades shoring up their security arm, this cannot be said the same of every code-LLM company. With a massive race to LLM-powered dev tooling products, corners will unfortunately be cut since being second to an application would be disastrous to revenue potential.

Ok, this just sounds like a bunch of doom and gloom. What am I trying to get at? How do you still get the benefits of AI while not exposing yourself to guaranteed security risk?

The simplest answer is instead of wondering where you send your code, just don’t send your code anywhere. Self-host the solution. Even Github Enterprise has a “Github Enterprise Server” option, which gives you peace of mind since your data doesn’t leave your premises, with only a relatively small hardware investment.

We believe that this is going to be the next chapter in this AI-for-coding space moving forwards (and a differentiator for new entrants) - startups will create a self-hosted option and enterprises will be willing to pay an upfront cost of one-to-two GPUs to know that their private data is never leaving their premises.

At Codeium, we are making a bet on this wave through our Enterprise offering, with an option to fully self-host code completion for your org, which means no-one ever has access to any amount of your code or data, including us at Codeium. That way you can get the most advanced AI-powered code acceleration toolkit and peace of mind at the same time.