Hey friends,
āChange is the only constantā is an adage I love and live by.
Iāve always held my opinions loosely and am forever ready to change my mind. I love experimenting and trying new things, Iām rarely married to an idea, and find it rather easy to move on.
So why am I bringing this up?
Well, after resisting the idea for the longest time, a few months ago, I decided to start building the databeats chat on Slack. I chose to keep it invite-only and made it accessible to early believers who supported the project with their time and money.
Today, Iām opening it up and counting on you to help build this thing into a calm, supportive learning space for anyone keen to learn about data and AI.
If youāre in, please complete this short survey to receive your invite! š¤
Back to todayās post.
If youāre a long-time subscriber, you'd know that Iāve stayed away from writing about AI, not only because I had so much more to write about but also because it takes time to learn a new piece of technology and form opinions.
Now Iām ready, hope youāre too!
What is an AI Agent? Definitions, Inputs, and Outputs
Permanent link to this post to bookmark and read later ā
This guide is powered by Aampe š
Agentic toolsāboth software and hardware that leverage AI agent technologiesāare transforming every industry. Itās not something that will happen in the near future; some industries, particularly automotive, have been experiencing it for years.Ā
Unlike what the name suggests, a self-driving car doesnāt drive itself but is driven by an agentic system that performs a series of interconnected and complex tasks based on real-time data.Ā
The terms āAI Agentā and āAgenticā, on the other hand, have only become popular in the last 12ā18 months due to the general availability of Generative AI and the commodification of LLMs.
One reason agents and large language models (LLMs) have become inextricably linked is that prominent figures like Andrew Ng have only referenced the notion of agentic systems in the context of LLMs. LangChainās Harrison Chase even defines an agent as āa system that uses an LLM to decide the control flow of an applicationā.
However, as āAI Agentā enters the mainstream, itās important to highlight that not all agentic systems need to interact with an LLMāagents can also be powered by other AI technologies and methodologies such as computer vision and reinforcement learning.Ā
This is the first post in this series for semi-technical folks (like me) to gain foundational knowledge about this emergent tech thatās already changing how software and hardware are built and used.Ā
The goal of this post is to offer clear definitions and examples of various types of agents based on the inputs they work with and the outputs they produce.
Agent and Agentic Definitions
Letās begin with the definitions.
To be an agent is to intentionally make things happen by oneās own actions.
Agentic is derived from the concept of agency in social sciences; it refers to a quality or state of being characterized by agencyāthe capacity to act independently, make choices, and exert influence on one's environment or circumstances.
Iād like to quote Schaun (my colleague and data science extraordinaire) who offered much-needed clarity: āEvery agent takes an input and produces an output. The input defines the context in which the agent is supposed to respond, and the output is the response itself. The agent is taught to recognize which responses are appropriate given which context.ā
Keeping that in mind hereās how Iād define an agent:Ā
An agent is a software component that can learn to process an input and produce an outputāautonomously, within specified boundaries.Ā
What does it mean to act autonomously? Well, given a certain input, an agent can inject variation or randomness in the output rather than produce the same output every time.Ā
In Schaunās words, āFor agents, inconsistent output or randomness is a feature whereas with traditional software it's a bug.ā
Talking about the input, it can be text (as in the case of LLM-based or more broadly, language-based agents) but doesnāt have to be.Ā
To reiterate Schaunās statement, āThe input defines the context in which the agent is supposed to respond.āĀ
Therefore, if the input is, say, behavioral data (event data), the agent processing and acting upon the input doesnāt need to interact with an LLM to produce a personalized recommendation.
Instead, the agent can reference an inventory of pre-approved items that are properly tagged with information that enables the agent to pick an itemāa product or a piece of contentāthat is likely to be relevant based on the input data.
Agent Inputs and Outputs
To work with a data set of any type, we must first know where the data originates. And we must know what types of data to expect to plan the next step.
Knowing the data type is particularly important when exploring an agentic systemābecause?
Because āthe input defines the context in which the agent is supposed to respond.āĀ
Now letās look at the various input types and outputs agents can create.Ā
Text as Input
Given a set of instructions along with specified parameters in natural language, a language-based agent can perform various tasks; here are some examples:Ā
An agent can create and execute a workflow or send a series of predefined emails to users who meet certain criteria.
An agent can perform some analysis on a given set of data and generate reports on a predefined schedule.Ā
The level of autonomy is subjective. In cases where the output is strictly based on a set of predefined rules, thereās no agent at workāthe autonomy is stripped away when the system has to adhere to strictly defined rules.Ā
Audio as Input
My understanding is that there are two distinct types of agents that can work with audio inputs:Ā
Agents that simply execute the instructions, similar to how agents execute workflows based on natural language inputs. In essence, these agents are similar to those that work with text as input.Ā
Agents that perform workflows based on the metadata of the audio input. For example, an agent thatās supposed to first recognize a voice and then execute a workflow based on the result of the previous step. Or an agent that sends out alerts based on keywords or tone. Or one that analyzes audio to detect diseases.
Image and Video as Input
Iād like to list down some identifiable agentic workflows that process images and videos:Ā
Perform an image search, find matching products, and offer recommendations by factoring in the userās preferences
Identify an object in a video feed, run it through algorithms to detect its authenticity, and create an authenticity score.Ā
Recognize gestures and perform a workflow if the agent detects some form of threat.Ā
Itās worth mentioning that agents can also generate a visual or text output, but doing so is merely a step in an agentic workflow.Ā
Sensor Data as Input
Some examples of sensor data are location, temperature, and humidity (environmental) and heart rate, retinal scans, and glucose levels (biometric). Below are some agentic workflows based on sensor data as input:Ā Ā
Autonomous vehicles decide optimal routes using location and weather data
A system sending alerts or shutting down the power grid based on seismographic data
A smart health monitor notifies emergency services and shares the userās real-time location based on a sudden spike in heart rateĀ
Behavioral Data as Input
The most common type of behavioral data is clickstream data from apps thatās stored as events. There are so many great use cases for agentic workflows that produce outputs based on behavioral data; here are some that come to mind:Ā
Identifying at-risk customers based on their product usage and offering times to talk to their customer success managerĀ
Detecting a fraudulent actor, interrupting a transaction, and sending out alerts in real-time
Delivering a personalized message to a user via the channel with the highest propensity for engagement (email, push, in-app, etc) and at the time the user is likely to take action (based on past engagement behavior)
Evaluative vs. Generative Agency
As we were figuring out the best way to communicate the difference between LLM-based and behavior-based agents, Schaun came up with the idea of grouping agents based on the mode they operate in.Ā
An agentic system can operate in two different modes: Evaluative and Generative.
In the evaluative mode, an agent evaluates the input based on pre-learned patterns and representationsāor everything the system already "knows". LLM-based agents like ChatGPT and Midjourney operate in the evaluative mode where context and queries reference that learned-in-the-past distillation of what the agent was trained on.
In the generative mode, an agent acts on an input and then revises what it already āknowsā in light of the new information it receives based on the action. Aampeās agents operate in this mode where the context and queries output new actions that change the learned representation itself.
Further, LLM-based agents are evaluative in general but are generative in individual sessions. That's why we can tell ChatGPT that it got something wrong based on which it tried to generate a new output.Ā
On the other hand, Aampeās agents are generative in general but evaluative in individual sessions. The agents continue to revise their understanding but when itās time to make a decision, they optimize the output by referencing what they already know.
To summarize, agentic systems can:Ā Ā
Evaluate inputs given pre-learned representations (LLM-based)
Actively revise their learned representations given the success of outputs (Behavior-based)
It makes perfect sense to me now but when I first heard this delineation, I asked Schaun, āIsnāt it confusing for people to read that LLM-based agents are Evaluative rather than Generative?āĀ
Hereās what he said:Ā
"The irony is that Generative AI is generative only within-session. It grows and adapts as long as you're feeding in additional prompts, but as soon as you start a new session, it loses all of that growth. Generative AI can adapt in the short term but has no long-term memory unless you invest in constant fine-tuning, which nobody does."
Did you find my definition helpful? Have thoughts on how to improve it?