Published on

Codeium: Truly Enterprise-Ready Code Assistant

Written by
Michael Li
Attribution hero image.

The public discourse about generative AI tools focuses on quality. Which LLMs are being used? What data is being used for training and what data is available at inference time via RAG? What is the experience for the end user? How fast is the system? How do the outputs stack up in quality against competitors?

These are all important questions, but all of them address the question of whether a tool is valuable. An orthogonal, but equally important question, is whether a tool is usable. Enterprise readiness is a real consideration that is often overlooked in new product development (if something is not valuable, what is the point of making it usable?) but these considerations end up requiring a lot of engineering to make a product work in the most complex environments. A core aspect of our mission is that we want Codeium to be valuable to every developer, even those that work in intricate enterprise environments, which means we take enterprise readiness disproportionately seriously in comparison to other AI tools today.

We have rolled out many enterprise readiness features recently, but will spend this post highlighting indexing access controls, subteam analytics, and audit logging capabilities.

Indexing Access Controls

While there are a few notable examples of large companies where every developer has access to all of the internal code, most enterprises have enabled Rule Based Access Controls (RBAC) on their codebases to restrict access to particular subsets of employees. This can be for a host of reasons, ranging from leak mitigation to actual hard requirements due to the confidentiality level of projects being worked on (ex. defense contracts).

A massive value add of Codeium is its context awareness engine, which is able to ingest, preprocess, and index existing code in order for the AI, at inference time, to pull in the most relevant snippets for the task at hand. This need is based on the distributed nature of code logic across files, and has led to a 38% increase in the amount of code being accepted by our users, via a combination of increased acceptance rates and longer accepted suggestions.

Because relevant code context often lives in external repositories, not just the one that the developer is currently working in, we built remote indexing and multirepo context awareness for our hybrid and self-hosted customers in order to generate even higher quality suggestions. While valuable, this raises questions around enterprise readiness that did not exist when a developer could only use the index corresponding to the repository that the developer currently has checked out. It would be a violation of RBAC if a developer who does not have access to a repository X is able to use features like at-mentions or context pinning to pull in code snippets from X while working in a repository Y that they do have access to.

This is why we built indexing access controls, purely for enterprise readiness. Enterprises can reuse (or define) their RBAC user groups that live in their existing Identity & Access Management (IAM) provider, and set access to these centrally managed remote indices to mirror what the rules that they have set on the corresponding repositories. This works with IAM providers such as Azure AD and Okta, or anything that can support the SCIM protocol. Indexing access groups can also be created and managed manually via API.

We had already built codeium yaml files that can be checked into the root of a repository to define, similar to gitignore files, which parts of the repository a company does and does not want to have Codeium reason over for context awareness (for any reason - old legacy code, particularly sensitive code with PII, etc). Indexing access controls allow for restricting the useful context to the people who should be allowed to leverage it.

Subteam Analytics

The topic of access controls is a facet of a larger reality about any decently sized organization - the organization isn’t just one big blob of homogenous developers. At the risk of sounding simplistic, developer organizations have teams. Most organizations measure health across teams, whether it is team-based KPIs, velocity, code quality, etc. In order to match expectations on metrics, and also just because it is valuable insight to internal admins of Codeium deployments, we have built subteam analytics throughout our analytics dashboards.

A company can again leverage their IAM groups, and analytics can be sliced by any of these groups, to see the number of completions, types of chat messages, and other Codeium usage statistics, as well as generic insights that Codeium can provide, such as language and IDE breakdown. A subteam in Codeium is a relatively generic term, so a company can decide to group developers in any way they want and get the metrics for them (ex. all data scientists or all employees in the US, etc) which can give insights such as Codeium’s effectiveness for different projects and whether there could be infrastructure-based latency-related degradation for developers in a particular geographic region.

Audit Log

How would we know if buggy code that is later found to cause an outage came from AI or not? Our customers recognize that Codeium is built for human-in-the-loop and so the buggy code is as much the responsibility of the developer as it is of the AI, which everyone knows will never be completely perfect. That being said, it is still important to track that the adoption of AI is not creating a massive spike in defective code, or in general, it is important to many enterprises to track what code has come from AI over time. Perhaps for regulatory audit reasons, maybe to do further analysis on how AI is being adopted. Whatever the reason may be, tracking AI-generated code is an important enterprise readiness feature that isn’t necessarily tied to value-to-end-developer.

Of course, for SaaS, we guarantee zero data retention, so we cannot store the accepted suggestions, but for our hybrid and self-hosted deployment, where any derived data is stored within the customer’s tenant, we have built out audit log functionality. To an in-tenent databse, the audit logger will log every accepted autocomplete suggestion text and the location where it was accepted (repository, file, line number) at the time of acceptance as well as every chat conversation (input and output text, as well as currently open repository). While we are currently unable to track whether accepted code was later edited/deleted over time, we are the first AI code assistant to natively provide some trail of what AI generated code may have entered your company’s codebase. In case of audit needs, one can query the database as desired.

These are just a subset of the features and capabilities that we have built for enterprise readiness recently - one click installers, pinning of sub-repo context items in remote repositories, integrations with private VertexAI endpoints, and more. And this is on the strong foundation of our self-hosted deployment, attribution filtering and logging, not training on non-permissively licensed code, and more. We support many production enterprise deployments of Codeium (each with thousands of engineers) across some of the largest companies in the world, so we are one of very few AI tools that have experienced the challenges of deploying at scale. We have iterated on these learnings to build a product that is not just incredibly valuable, but fully usable, by any enterprise.

Explore more

Apr 30, 2024enterprise1 min

WWT uses Codeium to accelerate their developers.

Apr 18, 2024company1 min

Codeium is the only AI code assistant in the 2024 Forbes AI 50 List.

Feb 12, 2024enterprise3 min

Vector Informatik uses Codeium to accelerate their developers.

Stay up to date on the latest Codeium & AI dev tool news

Please enter a valid email address