“I see that Codeium has a self-hosted deployment option. We value security, but no thanks. Sounds like too much, we will just go with your SaaS option.”
We hear this very often, to the point that we realized that we needed to write this blog post diving into the various reasons people quote as “too much” and why they are mostly rooted in misconceptions on how our self-hosted generative AI solution works. Too difficult to setup, too expensive, too generic. All of these are reasons why a self-hosted solution might be “too much,” and these opinions have good reason to exist, but we have done a lot of really interesting things to address the concerns, reducing “too much” to “a reasonable bit more”.
Unlike almost every other generative AI tool in the market, we have both SaaS and self-hosted enterprise solutions. Therefore, unlike other tools, we have no incentive to convince you that one is better than the other - we will make money either way. That being said, our goal as a trusted partner is for a company’s decision on SaaS vs self-hosted to come down to their security posture, not because of some misconceptions on difficulty or cost.
Reason #1: Too Difficult to Set Up, Update, and Maintain
Why this opinion exists
This is the number one reason we hear about avoiding self-hosting. SaaS solutions are great because of how easy it is to get started and because they require zero thought on updates and maintenance since those happen transparently in the background. It does not help that the few generative AI tools that do have self-hosted solutions often require a lot of integrations with your existing infrastructure and take multiple days to get up and running. Repeating this process for every update only makes this overhead worse.
On top of this, generative AI self-hosted solutions are novel because they require provisioning machines with GPUs! Not only do most competitors have a lot of inflexibility in GPU specs, making it harder to set up, but most companies don’t have preexisting in-house expertise on GPU server maintenance.
The Codeium solution
Our self-hosted solution is Codeium running on some compatible GPU-containing hardware provisioned within the company’s tenant, either in a Virtual Private Cloud (VPC) or on-prem data centers. We do so in a self-contained, containerized manner, with the only required integration to existing systems is your Identity & Access Management (IAM) provider. That’s it! For a single node system, you deploy via a single Docker Compose App (no Kubernetes required to run Codeium!) while for multinode systems (once you get into the many thousands of developers), we do have a Helm chart for a Kubernetes deployment. On average, our customers take just two to three hours to spin up Codeium within their tenant. An update just becomes applying a new set of Docker images, which is so smooth that many of our customers update their system with the latest and greatest multiple times a week.
On the second point, you don’t need any in-house GPU expertise as you will have support from our expert team in the rare case something goes wrong. As some context, we run Codeium’s SaaS tiers (individual and teams) on the same infrastructure that we ship to our self-hosted customers, and there are hundreds of thousands of active developers on that instance. The largest self-hosted deployment of Codeium will always be Codeium’s own SaaS tiers, and so you can trust that the infrastructure we ship to you is very robust (otherwise we will just be fighting fires all day for our SaaS tiers).
Reason #2: Too Expensive of a Hardware Commitment
Why this opinion exists
Let’s face it - GPUs are expensive! Anyone who hears about the tens of millions of dollars poured into training these LLMs and the scarcity of such hardware will easily come to the conclusion that investing in any hardware to run these applications is just not worth it right now. Perhaps you have even tried to spin up a machine with a GPU to deploy an open source model and realized just how little utilization you are getting from that GPU, further making the calculus on hardware cost per user even more bleak. Then you go to other self-hosted solutions and they ask for a lot of hardware investment as well, often for performance that is worse than SaaS competitors. The costs don’t seem to add up.
The Codeium solution
Before we dive in, a quick aside on Codeium’s background - the team’s expertise lies in large scale machine learning infrastructure, and before Codeium, we were building GPU virtualization and optimization software for some of the largest ML workloads such as autonomous vehicle simulations. We were supporting tens of thousands of GPUs on the public cloud (in our customers’ VPCs), and decided to use our industry-leading infrastructure as a launching pad towards building an end-to-end LLM application, which ended up being Codeium.
This means we actually have a multiple year head start over all of our competitors on the infrastructure side! Self-hosting infrastructure that works on GPUs? We’ve done it for much more complex workloads before. Optimizing GPU workloads to save on hardware costs? We’ve actually built a business doing that before. Batching, quantization, model parallelism, even fusing kernels? We’ve implemented them all, which is something other self-hosted gen AI tool providers have not, and you likely don’t have for your own self-hosted open source LLMs.
There is often conflation between the hardware requirements for training and for inference. You don’t need a large cluster for inference, and if you have software as GPU-optimized as Codeium’s, you don’t need many GPUs at all. For Codeium, our rule of thumb is that we can support ~500 developers on a single A100 or equivalent (ex. L40S). If you have a very geographically distributed workforce that codes at different times, you can probably fit even more without any degradation. For example, in an Azure VPC, you can get a box with an Nvidia A100 GPU for around $2200/mo without any commitments, so this comes out to low single digit dollars per developer per month.(1) To get all of the security and additional personalization wins of self-hosting? Not that bad. The economics become even better with purchasing for on-prem.
Reason #3: Too Generic to Fit Unique Needs for Hosted Infra
Why this opinion exists
Any organization that has been around for a while has probably built up a unique infrastructure plane with requirements and conventions that a self-hosted solution of any form might not work nicely with out of the box. We can just list out a small smattering of different reasons why a self-hosted AI coding assistant tool won’t work out of the box, courtesy of our own customers:
- We have developers around the world, so we need to set up multiple servers in multiple regions, which sounds much more difficult.
- We cannot download your Docker images from your public container registry due to our security policies. Or even better, our developers cannot download your IDE extensions from the public marketplaces.
- We work in AWS GovCloud where there are no A100s, just V100s (a whole different GPU architecture). Or, we have a bunch of RTX 4090s or T4s lying around, what is your GPU flexibility like?
- We don’t use GitHub for source code management. We use Bitbucket, Gitlab, Perforce, Gerrit, etc.
As you think of self-hosted vs SaaS, all of these might add up and you start asking yourself: sure it might be hard to convince Infosec to adopt a SaaS solution, but is it really harder than trying to make a self-hosted solution compatible with existing infrastructure?
The Codeium solution
As you probably guessed, we have heard all of these, and have solved all of them already. For just some of the things we have done: we allow you to pull images locally and push into an internal container registry, we provide downloadable versions of our extensions, we work directly with read access tokens from all source code management tools, etc. We also have experience working across a very broad range of GPU SKUs, and so we will never force you into very specific configurations - we are quite adept at working with what you have.
The point here is that we understand these issues and have had multiple years working on simplifying the deployment process and working in unique environments. Believe us, we’ve seen most everything, and if there is something truly unique about your setup, we are the team to address those issues.
Of course, all of this is talk. The best feedback we can get on our deployment process is from our actual enterprise customers:
- Oh wow, this was way simpler than we expected
- This went much more smoothly than your other competitors
- I really like working with the Codeium team. If there is an issue, I can just ask people who know the solution and they respond quickly
Now, of course there are caveats to self-hosting, just as there are with SaaS. There is indeed often a bit of a lag between when features are available on SaaS and when they have been properly packaged for self-hosting. And the licensing cost of self-hosting is a bit more than SaaS as it requires more effort and support from our end, and the hardware cost, while optimized, still only really starts making sense on the order of a couple hundreds of developers to amortize nicely. The self-hosted solution is still more work and more expensive than SaaS, but the point we are trying to make is that the gap is just not as long as the general opinion may be.
We don’t want to say self-hosting is the right solution for everyone. In fact, our SaaS plan is probably the right plan for many companies. But that doesn’t necessarily mean that our SaaS plan is the right plan for everyone. If you self-host your code for security reasons, you should probably use our self-hosted product. There’s no point to compromise on your security posture just to benefit from generative AI. Especially if you are in a highly secure and compliant industry, such as defense, finance, or healthcare, a self-hosted solution is likely the one that actually matches your requirements.
And if you are ever unsure which plan is right for you, we are always happy to talk. Contact us if you are unsure which plan is right for you - they are both great!
(1) The GPU is the expensive part. All of the other requirements on RAM, storage, etc are quite minimal. Network costs are also very small since only small amounts of textual data are transmitted on any inference call.