Change is the only constant

And like everything else, databeats is changing

Oct 09, 2024

Hey friends,

“Change is the only constant” is an adage I love and live by.

I’ve always held my opinions loosely and am forever ready to change my mind. I love experimenting and trying new things, I’m rarely married to an idea, and find it rather easy to move on.

So why am I bringing this up?

Well, after resisting the idea for the longest time, a few months ago, I decided to start building the databeats chat on Slack. I chose to keep it invite-only and made it accessible to early believers who supported the project with their time and money.

Today, I’m opening it up and counting on you to help build this thing into a calm, supportive learning space for anyone keen to learn about data and AI.

If you’re in, please complete this short survey to receive your invite! 🤝

Join the databeats chat

Back to today’s post.

If you’re a long-time subscriber, you'd know that I’ve stayed away from writing about AI, not only because I had so much more to write about but also because it takes time to learn a new piece of technology and form opinions.

Now I’m ready, hope you’re too!

What is an AI Agent? Definitions, Inputs, and Outputs

Permanent link to this post to bookmark and read later →

This guide is powered by Aampe 🔀

Agentic tools—both software and hardware that leverage AI agent technologies—are transforming every industry. It’s not something that will happen in the near future; some industries, particularly automotive, have been experiencing it for years.

Unlike what the name suggests, a self-driving car doesn’t drive itself but is driven by an agentic system that performs a series of interconnected and complex tasks based on real-time data.

The terms “AI Agent” and “Agentic”, on the other hand, have only become popular in the last 12–18 months due to the general availability of Generative AI and the commodification of LLMs.

One reason agents and large language models (LLMs) have become inextricably linked is that prominent figures like Andrew Ng have only referenced the notion of agentic systems in the context of LLMs. LangChain’s Harrison Chase even defines an agent as “a system that uses an LLM to decide the control flow of an application”.

However, as “AI Agent” enters the mainstream, it’s important to highlight that not all agentic systems need to interact with an LLM—agents can also be powered by other AI technologies and methodologies such as computer vision and reinforcement learning.

This is the first post in this series for semi-technical folks (like me) to gain foundational knowledge about this emergent tech that’s already changing how software and hardware are built and used.

The goal of this post is to offer clear definitions and examples of various types of agents based on the inputs they work with and the outputs they produce.

Agent and Agentic Definitions

Let’s begin with the definitions.

To be an agent is to intentionally make things happen by one’s own actions.

Agentic is derived from the concept of agency in social sciences; it refers to a quality or state of being characterized by agency—the capacity to act independently, make choices, and exert influence on one's environment or circumstances.

I’d like to quote Schaun (my colleague and data science extraordinaire) who offered much-needed clarity: “Every agent takes an input and produces an output. The input defines the context in which the agent is supposed to respond, and the output is the response itself. The agent is taught to recognize which responses are appropriate given which context.”

Keeping that in mind here’s how I’d define an agent:

An agent is a software component that can learn to process an input and produce an output—autonomously, within specified boundaries.

What does it mean to act autonomously? Well, given a certain input, an agent can inject variation or randomness in the output rather than produce the same output every time.

In Schaun’s words, “For agents, inconsistent output or randomness is a feature whereas with traditional software it's a bug.”

Talking about the input, it can be text (as in the case of LLM-based or more broadly, language-based agents) but doesn’t have to be.

To reiterate Schaun’s statement, “The input defines the context in which the agent is supposed to respond.”

Therefore, if the input is, say, behavioral data (event data), the agent processing and acting upon the input doesn’t need to interact with an LLM to produce a personalized recommendation.

Instead, the agent can reference an inventory of pre-approved items that are properly tagged with information that enables the agent to pick an item—a product or a piece of content—that is likely to be relevant based on the input data.

Agent Inputs and Outputs

To work with a data set of any type, we must first know where the data originates. And we must know what types of data to expect to plan the next step.

Knowing the data type is particularly important when exploring an agentic system—because?

Because “the input defines the context in which the agent is supposed to respond.”

Now let’s look at the various input types and outputs agents can create.

Text as Input

Given a set of instructions along with specified parameters in natural language, a language-based agent can perform various tasks; here are some examples:

An agent can create and execute a workflow or send a series of predefined emails to users who meet certain criteria.
An agent can perform some analysis on a given set of data and generate reports on a predefined schedule.

The level of autonomy is subjective. In cases where the output is strictly based on a set of predefined rules, there’s no agent at work—the autonomy is stripped away when the system has to adhere to strictly defined rules.

Audio as Input

My understanding is that there are two distinct types of agents that can work with audio inputs:

Agents that simply execute the instructions, similar to how agents execute workflows based on natural language inputs. In essence, these agents are similar to those that work with text as input.
Agents that perform workflows based on the metadata of the audio input. For example, an agent that’s supposed to first recognize a voice and then execute a workflow based on the result of the previous step. Or an agent that sends out alerts based on keywords or tone. Or one that analyzes audio to detect diseases.

Image and Video as Input

I’d like to list down some identifiable agentic workflows that process images and videos:

Perform an image search, find matching products, and offer recommendations by factoring in the user’s preferences
Identify an object in a video feed, run it through algorithms to detect its authenticity, and create an authenticity score.
Recognize gestures and perform a workflow if the agent detects some form of threat.

It’s worth mentioning that agents can also generate a visual or text output, but doing so is merely a step in an agentic workflow.

Sensor Data as Input

Some examples of sensor data are location, temperature, and humidity (environmental) and heart rate, retinal scans, and glucose levels (biometric). Below are some agentic workflows based on sensor data as input:

Autonomous vehicles decide optimal routes using location and weather data
A system sending alerts or shutting down the power grid based on seismographic data
A smart health monitor notifies emergency services and shares the user’s real-time location based on a sudden spike in heart rate

Behavioral Data as Input

The most common type of behavioral data is clickstream data from apps that’s stored as events. There are so many great use cases for agentic workflows that produce outputs based on behavioral data; here are some that come to mind:

Identifying at-risk customers based on their product usage and offering times to talk to their customer success manager
Detecting a fraudulent actor, interrupting a transaction, and sending out alerts in real-time
Delivering a personalized message to a user via the channel with the highest propensity for engagement (email, push, in-app, etc) and at the time the user is likely to take action (based on past engagement behavior)

Behavioral Data as Input for an AI Agent

Evaluative vs. Generative Agency

As we were figuring out the best way to communicate the difference between LLM-based and behavior-based agents, Schaun came up with the idea of grouping agents based on the mode they operate in.

An agentic system can operate in two different modes: Evaluative and Generative.

In the evaluative mode, an agent evaluates the input based on pre-learned patterns and representations—or everything the system already "knows". LLM-based agents like ChatGPT and Midjourney operate in the evaluative mode where context and queries reference that learned-in-the-past distillation of what the agent was trained on.

In the generative mode, an agent acts on an input and then revises what it already “knows” in light of the new information it receives based on the action. Aampe’s agents operate in this mode where the context and queries output new actions that change the learned representation itself.

Further, LLM-based agents are evaluative in general but are generative in individual sessions. That's why we can tell ChatGPT that it got something wrong based on which it tried to generate a new output.

On the other hand, Aampe’s agents are generative in general but evaluative in individual sessions. The agents continue to revise their understanding but when it’s time to make a decision, they optimize the output by referencing what they already know.

To summarize, agentic systems can:

Evaluate inputs given pre-learned representations (LLM-based)
Actively revise their learned representations given the success of outputs (Behavior-based)

It makes perfect sense to me now but when I first heard this delineation, I asked Schaun, “Isn’t it confusing for people to read that LLM-based agents are Evaluative rather than Generative?”

Here’s what he said:

"The irony is that Generative AI is generative only within-session. It grows and adapts as long as you're feeding in additional prompts, but as soon as you start a new session, it loses all of that growth. Generative AI can adapt in the short term but has no long-term memory unless you invest in constant fine-tuning, which nobody does."

Did you find my definition helpful? Have thoughts on how to improve it?

Share your thoughts on my definiton

databeats newsletter 🥁

Discussion about this post