Published on

Amazon CodeWhisperer is out of beta. We tried it. Spoiler: it isn’t good.

Written by
Codeium Team
Amazon CodeWhisperer is a disappointing solution.

Disclaimer: We develop the AI-powered coding toolkit, Codeium, but do not focus on Codeium in this post (just a short plug at the end!). This is an objective assessment of Amazon CodeWhisperer given our expertise in the AI-for-software-development space.

After a lengthy beta period, Amazon finally released CodeWhisperer publicly, which signals another major tech company entering the LLMs-for-code space, directly competing against GitHub Copilot (Microsoft). Touted as free unlimited autocomplete for individuals, this seemed like a great alternative on face value, so we tested it, and our assessment is clear. Amazon CodeWhisperer has significant shortcomings in every aspect: with a limited feature and availability set, very poor suggestion quality and postprocessing, unstable latency, and clear non-permissive license violations that bypass their reference checks.

About Amazon CodeWhisperer

When we did a head-to-head analysis of leading AI code assistants a few months ago, we did not include Amazon CodeWhisperer as it wouldn’t be fair to assess a beta product. But now that it is publicly available, we will similarly analyze the features and usability (latency + quality).

  • Price: The big pro is that it is free for individuals, which is only matched by Codeium.
  • Capabilities: Amazon CodeWhisperer provides single and multi line autocomplete suggestions, which matches the capabilities provided by Tabnine and GitHub Copilot. However, both Codeium and Replit Ghostwriter provide additional functionalities on top of autocomplete: Codeium also has free Chat while Replit Ghostwriter has chat as part of their paid plan.
  • Languages: Amazon CodeWhisperer has a much more limited set of languages than any of the other products.
  • IDEs: Amazon CodeWhisperer is only available on VSCode, JetBrains, and select AWS IDEs. This is an incredibly limited set, missing Vim, Neovim, Emacs, Web notebooks + IDEs, and more.
  • Enterprise plan: Amazon CodeWhisperer’s enterprise plan is simply a seat management tool, like GitHub Copilot. It does not actually provide self-hosting and other data security guarantees such as what you get from Codeium for Enterprises and Tabnine’s Enterprise plan.

Overall, Amazon CodeWhisperer is not the clear leader in any axis of functionality, price, and availability.

Usability Tests

Even if Amazon CodeWhisperer is limited in functionality and availability, it might still be a useful product if it had better quality and latency than other available tools. As mentioned in the original head-to-head test, this is a more qualitative assessment, but still worthwhile.

We will start with the simplest quality test from the head-to-head post: writing a function and tests for adding a list of numbers in JavaScript:

This was supposed to be easy, so we were surprised to see a number of experience issues:

  • For the function itself, Amazon CodeWhisperer was too aggressive in suggesting adding two numbers (while Codeium and GitHub Copilot waited for the ...nums to be typed), tried using a brace rather than an inline arrow function, and didn’t suggest the full function body after the arrow (waited for nums.).
  • There were errors on merging suggestions with existing code, requiring us to go back and edit the tabbed results to actually get something that looked correct.
  • Did not suggest easy syntactic snippets like closing ); after each test.

Since this wasn’t great, instead of going to the harder, more niche examples in the post, we thought we would just test another very common task first: writing a graph topological sort algorithm in Python. This algorithm appears hundreds, if not thousands, of times online, so this should be a very simple task.

Where do we start? Here’s a list of frustrating moments where our expectations of results from the AI (built from experience using GitHub Copilot, Codeium, Tabnine, and others) did not match the provided suggestions:

  • Starts by just adding a TODO and empty return rather than actual useful code.
  • Does not generate any code at relatively common places, such as completing an iteration (for g in graph), and when it does, it doesn’t make sense (for graph in graph) and has errors such as additional spaces (for g raph in graph)
  • The latency is very variable, even for very short suggestions. There is a lot of uncertainty if a suggestion is even being generated or if the latency is just bad.
  • Amazon CodeWhisperer gives a suggestion using a method dfs that does not actually exist. We still wanted to see if it could implement the actual code, so kept trying to trigger suggestions for the visit node logic.
  • After Amazon CodeWhisperer finally does produce a reasonable suggestion, it starts rambling uncontrollably and completely deviates from the algorithm.
  • We then tried adding a docstring and it ignores everything that already exists and puts in a proper topological sort algorithm. All of the other products did not need this docstring prompting to properly suggest the code.

By this point, it was pretty clear to us that Amazon CodeWhisperer is still incredibly behind others in terms of usability. We continued to try using it for our own work to see if for some reason it was better in a production setting rather than one-off tests, but we can only report equally poor, if not worse, performance. One of our engineers even received comment suggestions in languages other than English when there is only English in the codebase! We suspect significant issues during dataset preparation before model training.

To be clear, Amazon CodeWhisperer could theoretically still improve, but we at Codeium know how much engineering and model iterations it took to generate reasonable suggestions and merge them cleanly with the existing code so that the experience is seamless and magical. That being said, it does not inspire confidence that these issues persisted despite an extended beta.

Licensing Assessment

There could still be one redeeming feature of Amazon CodeWhisperer. As we analyzed earlier, GitHub Copilot clearly emits non-permissively licensed code, and their post-generation filters do not actually work. Amazon CodeWhisperer claims that they have the ability to detect and provide references to such code snippets if they are indeed generated. If Amazon CodeWhisperer does indeed either (a) not train on non-permissively licensed code or (b) is able to accurately catch such generations, then it would have a leg up on GitHub Copilot (although still equivalent to Codeium and Tabnine, both of whom don’t even train on non-permissively licensed code).

We will refer the reader to our previous blog post for an explanation of non-permissive licenses, why they matter, and how we tested GitHub Copilot with an LGPL-licensed code block. We will take the same LGPL-licensed code block from that post, and test Amazon CodeWhisperer with it:

Well then, clearly not a good look, but what specifically can we tell from this?

  • Amazon CodeWhisperer has clearly trained on non-permissively licensed code. There is nothing in the context to suggest using these variable names or macros like CS_CSC.
  • At no point is Amazon CodeWhisperer’s Reference Log triggered, which is their solution to detecting potential copied non-permissively licensed code. Their filters are even worse than that of GitHub Copilot - Amazon CodeWhisperer happily produces non-permissively licensed code without any modifications and still cannot detect it.
  • The suggestions are still very poor. Amazon CodeWhisperer starts rambling the same return statement, is unable to close a brace, adds spaces where it shouldn’t (such as variable names and square brackets for indexing), and for some reason starts to produce completely incorrect code in comments.

Concluding Thoughts

When Amazon CodeWhisperer finally came out of beta, we were excited - of course it means more competition, but the space is moving quickly and we at Codeium believe that we can and should learn from everyone else in order to create the absolute best product. The fact that it was touted as free for individuals was also great - so far we have been the only ones pushing to democratize access to this technology to all developers!

Unfortunately, the reality is that Amazon CodeWhisperer is an inferior coding assistant in every aspect, especially when compared to Codeium and GitHub Copilot.

It can definitely improve, but unlike other LLM products, a code assistant like this isn’t simply an API or model invocation wrapper. Even putting aside the entire space of modeling improvements, there is a significant amount of engineering work to guarantee good latency, merge in suggestions cleanly without being distracting, integrate into any IDE that a developer would need, and build capabilities past simply autocomplete. We’ve learned this firsthand at Codeium, and are continuing to push the boundaries of what is possible.

As the shortest of plugs, we don’t want you to take our word on Codeium’s superior performance - give it a shot yourself in your IDE of choice (it’s free and takes just a couple minutes to install!)