Published on

The Effect of Generative AI on the Human-Tool Interface

Written by
Anshul Ramachandran
A lot of tools.

Many thanks to Nick Moy, Matt Li, Varun Mohan, and others, for discussions that helped solidify this mental model.

“Generative AI is this generation’s defining technology, it will completely transform everything.” My friends know that I usually scoff at these grandiose VC claims. After all, there seems to be a transformative technology every other year or so, most of which haven’t really panned out (I’m looking at you, AR/VR and self-driving cars). I don’t get impressed by promising initial growth or get caught up in hype. So when I told my friends that I actually agreed with this statement for generative AI, naturally I was questioned why.

I won’t point at numbers or quote tech influencers - I will just look at the technology from first principles. Specifically I am excited by how tech-based tools, which are the true catalysts for knowledge-based work, can be up-leveled in ways not possible pre-generative AI. To understand why, we first need to understand how humans do work and how tools fit in (what I call the Human-Tool Interface). Then, we will be able to explicitly call out what parts of this Human-Tool Interface are affected by generative AI, and how this will enable the entire Human-Tool Interface to transform in a way not possible beforehand.

Since our work is building Codeium, a generative AI tool for software developers, our examples will come from that world, but the reasoning will remain the same across any vertical. So let us start from scratch.

How Humans Do Things

We need to first build a very simple model on how humans do knowledge work. To start, humans have a stateful brain:

Brain.

The brain has access to some set of internal models that the human has learned over time, which it can use to complete tasks (ex. “this is the algorithm for quick sort” or “this is how to read a stack trace”). The brain also has access to some existing knowledge bases that it could pull information and context from (ex. a codebase or system architecture documentation). The brain essentially applies these internal models on existing knowledge to complete tasks.

Add models and knowledge.

This is the reasoning half. The other half is the action half, and this is a lot simpler. The human performs actions on some surface (ex. a code editor or a webpage).

Add surfaces.

And that is a simple abstraction on how humans do knowledge work! What is creating a slide deck summarizing a report? Using existing slide-making skills (internal model) to display some information extracted from said report (existing knowledge) in a presentation tool like Powerpoint (surface). What is writing an outbound sales email? Filling in some tried-and-tested outbound email structure (internal model) with information about the outbound target (existing knowledge) in an email text editor (surface). What makes a human powerful is this brain that connects these three together.

The Human-Tool Interface

So… why add a tool? Well, there are inefficiencies in how humans do work. The brain is fast and powerful, but it has limited memory for state and is often poor at knowledge retrieval. In addition, any individual human only knows a subset of all potential “models'' to complete various tasks. This is the large difference between a junior employee and senior employee - a senior employee has seen more things and so has built a larger internal repository of models that can be used to complete a wider variety of tasks. Tools are meant to help assist humans along these inefficiencies.

Adding a Tool

To start, a tool has a separate “brain.” A tool also makes internal decisions and can theoretically also have stateful memory, just like a human brain:

Add tool brain.

Theoretically, a tool can have access to the same public knowledge as the human, or at least a very large subset of it. In the software world, this obviously includes the existing codebases, but could also include documentation or tickets or other context that a human would use to complete tasks:

Add tool connection to knowledge.

Tools have a different set of models as humans, which they can access to complete tasks. These might just be the codified version of how a human completes a task (ex. this is the code for quick sort), but as we will soon discuss, can also be learned models:

Add tool models.

Finally, the tool has “interactions” with humans at the surface level. This is where the human and tool actually communicate with each other. Via these interactions, the human can convey state internal in their brain to the tool while the tool can relay information back to the human. In many cases, tools actually come with new surfaces to have interactions on, such as Powerpoint the tool requiring Powerpoint the surface, the latter being useless without the tool, but this is not necessary - tools can repurposes existing surfaces and simply create new interactions:

Add interactions.

This is the human-tool interface, and is an abstraction that will help us understand why generative AI, unlike any past technology, uniquely revolutionizes knowledge work, as well as crystallize what kind of work needs to be done to create the best AI-powered tools. But first, let us walk through some examples of tools (or features of tools) and reframe them in the human-tool interface.

Examples of the Human-Tool Interface

Let’s start simple. What is the terminal? It is a tool that lets developers abstract a bunch of operating system level actions, which fills in the gaps in a developer’s internal set of models on how to directly interface with the OS. The tool brain is pretty simple, it just passes through the human command to an internal model that knows how to map it to OS-level commands and calls an execution engine. It comes with a surface, which is a terminal window, and the human-tool interaction is “command” like - the terminal performs an action as commanded by the developer, and populates some text back into the surface to convey the result:

Example: Terminal

Let us take this one step further. What is the explore panel in VSCode, which you can use to find keywords across your repository? This is a tool that helps fill in the gaps in a developer’s limited knowledge retrieval abilities and finite internal state (we cannot remember where every keyword appears in a repository). The brain is a little more complex - it retrieves the codebase as knowledge and passes through the input and codebase to an underlying “grep” model. Also, the side panel in VSCode already existed so this tool is repurposing a surface, and this time the human-tool interaction is “discover” like:

Example: Explore Panel

How about something even more complex? What is Intellisense? This is a tool that helps accelerate a developer by smartly retrieving and suggesting entities from around the codebase, such as methods of a class, so that a developer doesn’t have to manually reference or remember exact naming or details of existing, reusable code. The brain is a lot more complex now - it has ahead-of-time retrieved the codebase (the knowledge) and used a model to preprocess it to generate a trie data structures, which it then keeps as state. Repurposing the editor surface, the human-tool interaction is now “flow” like, where the developer isn’t triggering the tool, but rather the brain smartly recognizes when it should call a retrieval model on its internal data structure state to generate suggestions, which are then populated back into the editor:

Example: Intellisense

In fact, these are the three main kinds of interactions that we have noticed across all tools:

  • Flow: Both developer and tool know what the result needs to be and have the ability to do so, but the tool helps accelerate the developer by being faster.
  • Command: Developer knows what the result needs to be but may either be unable to do it themselves or there is too much overhead. That being said, the tool does not know how to complete the task without some instruction from the human.
  • Discover: Developer doesn’t know exactly what the result needs to be and is in more exploration mode, and the tool can help nudge them in the right direction, either passively or with instruction.

Why Generative AI is a Revolution

Hopefully by now we have some appreciation of the human-tool interface abstraction, and if you’ve made it this far, you might be begging the question - what the heck does this have to do with generative AI?

It comes to the limitation of current tools - current tools have high precision and low recall of the action space. What does this mean? Current tools have models that let them do a very particular task, or maybe a set of very particular tasks, very well, but they generalize terribly. This means there is a lot of the space of potential tasks that current tools may be able to do if someone took the time to code up the relevant model (welcome to the world of application software), but are currently untouched. Not necessarily because the technology to write such tools and models don’t exist, but because no one has had the time to make them.

Generative AI changes that. Now we have models that have high recall at pretty impressive precision across the action space, and any tool can access these models. This doesn’t mean we throw out current models - it doesn’t make sense to start using AI to do tasks that could use existing models that have high precision for that task (please don’t replace a terminal’s execution engine with an AI). But it does mean that we can rapidly enable new tools - instead of starting with a poor model that needs time and effort to improve, any tool can start with a model that is pretty good. And a whole new class of tools will crop up that will leverage this recall.

Examples of Human - GenAI Tool Interfaces

Take GitHub Copilot, arguably the first real productionized GenAI application. In reality, the autocomplete functionality serves the exact same purpose as Intellisense, but because there is a smarter, more generic model via LLMs, the brain is actually significantly dumber. All Copilot’s brain does is take the state from the editor surface such as file and cursor position, and other open tabs, takes some snippets of this code, and passes it into the LLM. And this is wildly useful, which goes to show that a proven interaction with a better model, even with a very dumb brain and incredibly poor use of knowledge (no preprocessing of the codebase for this one), can add lots of value. But we immediately also can infer that Intellisense should take priority over Copilot whenever it is triggered, since Intellisense is very high precision for the places that it is relevant.

Or what was ChatGPT really? Chatbots have existed for years, but often for very specific domains with a predesigned set of questions and answers (think about any automated customer success flow, either on a website or on the phone). Yes, these have very high precision if you want the answer to exactly what the chatbot has an answer for, but generic questions with nuance? That’s a high recall problem. ChatGPT had a smarter, more generic model, and nontrivial brain where it kept state of the conversation history and used that to prompt future conversations (I talked about this UX-to-abstract-state breakthrough in a separate post). But if you think about it, when it was launched, ChatGPT didn’t do any knowledge retrieval and had a pretty suboptimal surface, in the fact that it was a completely new webpage rather than integrated directly into applications that could use a chatbot, whether it be company websites or even IDEs for software developers. More capable model and a little bit of brains, and we got the fastest growing consumer software product in history. That is the power of generative AI in revolutionizing the human-tool interface.

Where GenAI Tools Will Go

The human-tool interface also explains what is next. Pretty simply, the most powerful genAI-based tools will be the ones that develop all axes of the human-tool interface toward their maximum potential for the given application:

  • Most capable: the best models, which obviously means best LLMs for the application
  • Most knowledgeable: hooked into all of the knowledge that a human would have available, so that the tool can provide higher quality, more personalized, and more grounded results
  • Most intuitive: the best interactions on surfaces that make sense, so humans seamlessly get the maximum value from the tool
  • Smartest: the most advanced brain that reasons about state, knowledge, and models, including existing high-precision models

Note that the LLM is just one of these axes, even though it gets an outsized amount of public focus. This outsized focus on the LLM, ignoring the work required in all of the other axes, is why many companies think that it is reasonable to build something in house using APIs or open source models. Very quickly, they are often proven wrong.

You would think that companies building AI applications would focus on all of the axes, but in just the AI devtool space, we see a bunch of folks focusing on just a subset of axes today:

  • OpenAI: building the most capable models, but perhaps not the best for code, and not investing in the brains or surfaces
  • GitHub Copilot: great product team building intuitive interactions with the GitHub SCM and perhaps even some knowledge retrieval if your code is hosted on github.com (GitHub SaaS), but they are obviously missing other source code management (SCM) tools and they don’t own their own models to make them more capable
  • SourceGraph Cody: utilizing past experience with code search to improve on the knowledgeable axis, but the brain and models are not as capable due to high latency, and they are missing out on the intuitive axis with limited IDE support

The reality is that the tool is as powerful as the weakest link, which is why at Codeium we are invested on all of the axes:

  • Most capable: proprietary LLMs trained from ground up, custom built for code applications and backed by industry-leading infrastructure for low latency
  • Most knowledgeable: hooks into any SCM, full (and multi) repository codebase awareness via embedding vector databases, finetuning options in self-hosted plans
  • Most intuitive: autocomplete and chat, lots of intuitive workflows and connections between various surfaces that Codeium hooks into (see previous post on AI UX), and on the most IDEs of any tool
  • Smartest: this is one that really differentiates Codeium from the rest. We have built incredibly advanced systems that can optimize what knowledge we pull from the repository to put into the LLM’s limited context lengths, and have building blocks to understand and incorporate user intent proactively into the AI’s actions

This is why we call Codeium the modern coding superpower - the most powerful AI devtool.

Conclusion

Calling something the “human-tool interface” might sound a little gimmicky, but it allows us to understand what makes a good tool, why generative AI is changing the space and identity of tools, and how the best generative AI tools will differentiate from the rest. If you find this interesting, we write a lot about specific axes of the human-tool interface (particularly with respect to software development and the Codeium product) on our blog:

Explore more

Jun 11, 2024enterprise7 min

A deeper dive into recent enterprise readiness capabilities: indexing access controls, subteam analy...

Apr 30, 2024enterprise1 min

WWT uses Codeium to accelerate their developers.

Apr 18, 2024company1 min

Codeium is the only AI code assistant in the 2024 Forbes AI 50 List.

Stay up to date on the latest Codeium & AI dev tool news

Please enter a valid email address