Mapping the GenAI landscape

Written by Chris Robert | Apr 7, 2024 11:34:44 AM

Since OpenAI’s ChatGPT release officially ushered in our current era of All AI All the Time (AAIATT for short), there’s been a nonstop flurry of developments. The emerging field has become flooded with a cacophony of new terminology, all of which comes off as really confusing to outsiders. I’ve put together a simple conceptual framework for some of the most common terms being thrown about, and I thought I’d share that here (also here in this slide deck). If it’s useful, great!

GenAI and LLMs

In this framework (pictured above), I’ve tried to map the text-based generative AI landscape. Generative AI (GenAI) is all about AI systems that help you generate content, like text, images, or video. Here, I’m focused on GenAI for text, built on large language models (LLMs). ChatGPT is one example, and new LLMs are being released more or less daily at this point.

I’ll run through the map counter-clockwise from the top-left (and do see these slides if you want a quicker, more-visual version).

Chatbots

First, chatbots are text interfaces like ChatGPT that allow you to interact directly with an LLM. There are loads of chatbots out there now, including many fun and customizable ones like Poe and character.ai. These generally have a keyboard interface — where you type your question or comment and the chatbot responds in text — but increasingly chatbots include text-to-speech and speech-recognition tech that allows you to just converse with them verbally.

AI assistants

AI assistants are essentially chatbots that are focused on a particular subject or task. So, for example, I’ve worked on AI assistants for survey research and M&E as well as AI assistants for SurveyCTO users. You can use a generic chatbot like ChatGPT for lots of different things, but increasingly you have AI assistants that are better trained to help you with very specific tasks.

Copilots

Copilots are essentially AI assistants that are embedded within a single product, generally to help you use that product. GitHub Copilot, for example, helps programmers write code, embedded directly into the tools they use to develop software. Microsoft is also building copilots into Word, Excel, and all of its Office suite (though, confusingly, they also have Microsoft Copilot, which I guess is meant to be a copilot for, well, everything). Google and basically everybody else is doing the same.

What distinguishes copilots is not only that they’re embedded within products, but that they typically take advantage of the integration to make the AI more useful to you. So GitHub Copilot can see all of my code and directly make changes for me. Copilot in PowerPoint can create slides. Etc. The key thing is that it saves you a bunch of copying and pasting, allowing the AI to interact directly with the tools you’re using.

AI-powered product functionality

Now, the thing is that you don’t always interact directly with the AI. It can be powering product functionality in ways that don’t look at all like chatbots. If you’re analyzing qualitative data in Atlas.ti, for example, it can have it automatically code your data for you, and it’ll use an LLM behind the scenes — but you won’t actually see any of the actual back-and-forth with the LLM. The product basically talks to the AI for you, using it to perform tasks on your behalf. Meeting summaries from tools like Zoom, Teams, or Meet are other examples where the product uses the LLM on your behalf, and you might not even be aware.

Agents

Agents are AI systems that use LLMs in a multi-step, semi-autonomous way, in order to achieve some sort of objective. So Tavily AI, for example, helps you to execute desk research. You tell it what you want, like a one-page report on some arcane subject, and it’ll figure out the appropriate sequence of steps, execute those steps, and then give you the result. Generally, it’ll Google to find appropriate content, read, summarize, and analyze that content, and then put everything together for you.

What generally distinguishes agents is that they plan and execute multi-step sequences, in service of some objective. People are already giving agents access to lots of tools and unleashing them on increasingly complex tasks, which they break down into bite-sized tasks and set about executing, step by step. (This is one area where developments are particularly exciting — and unsettling.)

Infrastructure

Just a few final notes on the AI infrastructure at the heart of all of these systems:

First, it’s really important that these systems be monitored for safety, efficacy, and learning. All of these systems should have (a) a secure (and generally anonymous) way to log interactions as well as (b) both automated and semi-automated systems for reviewing those interactions.

They should also take great care to secure both interactions and storage, to safeguard privacy.

In the rush to adopt exciting new technologies, these things can be easy to neglect — but only at our own peril!

Again: see here for the slides. And please comment below if you have any suggestions or feedback you’d like to share!

Note: all screenshots by the author and all images created with OpenAI's ChatGPT (DALL-E).

View full post