This guide is part one of the series on understanding customer data for product analytics; it was originally published on the Amplitude blog.
A typical conversation about data often brings up the privacy practices of big tech — the fact that they gather too much data and the growing concerns over the opaqueness of their data policies have given birth to stringent privacy laws such as the EU’s GDPR and California’s CCPA.
Privacy laws and the fact that browsers are making third-party cookies obsolete are making companies more accountable, forcing them to take a hard look at their data collection practices. As a result, a snowball effect is taking place — companies are embracing transparency and creativity while trying to stay compliant, and the awareness about data is increasing amongst individuals.
Customer data is the centerpiece that enables personalization and automation at scale — it provides context on the user as well as the user’s behavior in terms of using a product.
Customer data is best described when broken into the following two types:
User data: It provides context on a user and their traits, and is also referred to as entity data (user being the entity)
Interaction data: It provides context on how the user interacts with a product and is also referred to as event data, behavioral data, or product-usage data
Customer data is also gathered when users interact with your brand outside of your core product experience via secondary data sources or third-party tools used for advertising, engagement, and support, to name a few. However, this guide focuses on customer data that comes from a primary or first-party data source — a website, app, smart device, or a combination of these — and comprises entity data and event data.
P.S. The “customer” in “customer data” includes free users of a paid product as well as users who pay with personal data to use a product.
Entity data
Entity data includes personally identifiable information (PII) such as name, email, and phone number, as well as other details such as age, country, and preferences.
It is often referred to as user data since a user is the main entity or object. It comprises user properties or user attributes, each of which stores information or traits about a user.
Entity data is stored in tables where columns represent user properties like name and email, while each row represents a user. One of the properties acts as an identifier and has to contain a unique value for each row (user).
In the table above, email can act as an identifier by ensuring no two users have the same email. However, it is a better practice to assign a unique ID to each user since an email address can change but the user_id remains fixed.
Accounts or groups as entities
A group of users or an account is also an entity with distinct attributes generally referred to as organizations or workspaces in the case of B2B SaaS products.
From a hierarchical point of view, accounts are groups that comprise users. Thus, the data about an account or group comprises group properties that store information about an account such as the subscription type or the number of users. If accounts are known as organizations, the associated properties should be referred to as organization properties.
It is common to gather data about both users and groups at the same time. This is, once again, particularly true for B2B SaaS tools where a user is part of an account or organization with multiple users.
How do you collect entity data?
Entity data where user is the entity is gathered as a result of users sharing data directly or indirectly.
Users share data directly when they input details in a form, respond to an email or a survey, or when they interact with conversational interfaces like chatbots and voice bots.
On the other hand, users share data indirectly when they use a product. When listening to music on Spotify, a user shares data about their music preferences including genres, artists, and even specific songs they like. Similarly, when a user creates reports on Amplitude, they share data about the type of reports they find useful.
Since Amplitude is a B2B SaaS tool where multiple users are part of an organization, the number of reports created under an organization is data associated with an organization and not a particular user. Hence, in this case, Organization is another entity, number_of_reports is an organization property, and the value of this property changes when any user in an organization creates or deletes a report.
It’s important to not confuse entity data that changes as a result of product usage (number of reports) with event data that is generated when a user interacts with a product (report created)
Event data
An event refers to a unique action performed by a user while interacting with a product, and the data generated in the process is called event data or interaction data.
Clicks and hovers on the web, taps and swipes on mobile, and text or voice commands on chat and voice interfaces — all such interactions are actions performed by a user or events that take place inside an app.
Event data enables you to understand user behavior and is therefore often referred to as behavior data. Additionally, event data enables you to take action on data or activate the data in external tools where the data is made available.
A common use case is event-based contextual messaging (in-app or email) where a campaign is triggered when a certain event X takes place. Or when a certain event Y doesn’t take place within a specified timeframe after X takes place — the possibilities are endless.
Event data comprises three key elements:
The action or the event that took place
The timestamp or the precise date and time when the event took place
The state or all other properties associated with the event (known as event properties)
Add to Cart, Buy Now, and Complete Payment are all actions or events. The exact moment when an event takes place is recorded as a timestamp.
The properties that provide more context about the event Add to Cart could be user_id, product_id, price, and quantity—all of which provide information related to the event or the state of the event.
How do you collect event data?
Collecting event data requires you to create a tracking plan specifying the events to track and the associated properties for each event. Then you get your data engineering team to implement the tracking plan using either of the following:
CDI (customer data infrastructure) or CDP (customer data platform)
Custom tracking service
Once event tracking is implemented, event data collected is made available in the configured destinations (product analytics and engagement tools) and typically, a copy of this data is stored in a data warehouse.
Learn more about event data or behavioral data collection tools in this guide.
What data to track vs how to track it
While it is good to know about the event-tracking process, as a data-led professional, you should focus on what to track rather than how to do it.
Why so?
I won’t disagree if you argue that defining what data is to be tracked and the tracking process itself are equally important. However, these two activities should ideally be owned by different people and depending on the size of your organization, maybe even different teams that collaborate closely.
To engineer or not to engineer
Typically, a data engineer takes care of implementing the tracking and collaborates with product and growth teams to decide which tools and technologies to use. The company stage, the scope of work and rework, available resources, priorities, and several other factors influence this decision.
Many companies, however, leave the entire tracking process to the data/engineering team, keeping marketing and product folks completely out of the loop—doing so invariably results in data that is inaccurate, inconsistent, and often redundant when too many events are tracked just for the sake of tracking.
Deciding what to track is simply not an engineer’s job and expecting engineers to know how other teams wish to use what data is, well, disastrous.
Putting customer data into action
Now that you know what customer data is and what your role is in the process of gathering it, the next step is to be able to answer the following questions:
What do event and entity data look like in the context of customer data?
How to decide which events to track and what data to gather?
Lucky for you, each question above links to a guide in the 5-part series on customer data.
Once you have answers to the above, you will be equipped to gain a clear understanding of how to create a data tracking plan and will be able to do the following with confidence:
Lead the implementation of event-driven analytics and engagement tools with confidence
Gather clean and consistent customer data and overcome challenges that crop up along the way
Ask the right questions of your data in order to better understand user behavior
Identify opportunities to collect and act upon data to elevate the customer experience
Build better products, provide better experiences, and have better conversations
Lastly, it is incredibly useful to have a good understanding of various data types before you begin working on your tracking plan, so whenever that is, this guide on data types will be helpful.
Now move on to part 2 to understand the purpose of collecting event data.