Data -> Insights -> Action
How to derive insights and drive action using a CDI or a Data Warehouse
In this guide, I am covering the process and tooling to combine event data from first-party sources with object data from third-party sources to analyze the data (to derive insights) and activate the data (to drive action).
Event data from your website and apps (primary data sources) helps analyze user behavior.
Object data from sales, marketing, advertising, and support tools (secondary data sources) helps analyze user engagement.
So why combine event data with object data?
A combination of the two helps measure the impact of engagement activities on user behavior.
This is best explained with an example as follows:
You lead growth for a company that sells a workflow automation tool.
You look at events on a product analytics tool like Mixpanel or Amplitude to analyze the user journey from sign-up to activation and beyond.
You also use event-based engagement tools like Customer.io or Intercom for lifecycle emails and Userflow or AppCues for in-app onboarding to get the user to perform desired actions and reach the aha moment.
P.S. That’s me during my time at Integromat. ☝️
However, to measure the impact of your engagement activities, you’d want to see if a particular campaign (emails or in-app prompts asking a user to, say, create a workflow) actually made the user perform the desired action (of creating a workflow).
While this sounds like a no-brainer, such analyses require you to combine event data from your app with data from your engagement tools.
You can also go a step further and personalize the engagement campaigns using a combination of this data. Here’s a quick example before I jump into the process and the tooling:
Your lifecycle email campaigns are running on Customer.io
Your in-app onboarding campaigns are running on Userflow
You now wish to personalize the email campaigns based on the completion of specific steps of the onboarding campaign
Once again, to do this, you’d need to combine event data from your app with object data from Userflow (or your onboarding tool) and make this data available on Customer.io (or your email tool).
There are two distinct approaches to combining event data and object data for the purpose of analysis and activation.
1. Using a Customer Data Infrastructure (CDI) tool
Customer Data Infrastructure, by definition, is a software system that collects customer data from various data sources (both primary and secondary) and integrates with a host of third-party tools where data is sent.
CDI tools can also handle identity resolution and offer the ability to build user audiences via a CDP add-on.
Being a ready-made solution, a CDI has some limitations as follows:
Support for third-party data sources is limited and you might have to build your own integrations with certain tools that don’t exist as data sources.
Almost always, combining data from disparate sources requires data transformation and since a CDI is meant to be used by less-technical users, the transformation capabilities are limited.
Lastly, the biggest limitation of a CDI is that you need to adhere to a fixed data model. You cannot model the data as per the data model of your app — many-to-many relationships between accounts and users, for example — or that of external tools.
While this might make a CDI sound like a deal-breaker, there are actually many use cases and reasons to adopt a CDI anyway — most companies don’t have the resources and the talent to build an in-house solution by stitching multiple standalone solutions.
2. Using a Data Warehouse
A Data Warehouse is essentially a database designed for the purpose of analytics and is used to store data from all possible data sources — first-party apps, third-party tools, as well as production databases.
Due to the affordability and ease of setting up a data warehouse, popular cloud data warehousing solutions (like Snowflake, Google BigQuery, and AWS Redshift) are experiencing rapid adoption.
It’s pretty straightforward to sync data from all your data sources using purpose-built data collection tools – I won’t go into the details as this was covered in the last two issues.
Once the data is in the data warehouse, you need to write SQL queries in the warehouse to combine and transform data from multiple sources. This process is referred to as data transformation and it enables building custom data models which are then synced to third-party tools for analysis and activation.
While data can be transformed solely within the warehouse itself, purpose-built data transformation tools like dbt and Trifacta (acquired by Alteryx) bring a lot of flexibility and automation to this process.
Either way, once the data has been prepared, it needs to be synced from the warehouse to the tools used for data analysis and data activation.
In terms of analysis, all Business Intelligence (BI) tools (like Looker, Mode, or Tableau) and some Product Analytics tools (like Amplitude and Indicative) have native integrations with data warehouses.
However, in order to activate the data in the warehouse, you need to use yet another integration tool to sync the data from the warehouse to third-party tools used for sales, marketing, advertising, and support. And this is where Reverse ETL tools (like Hightouch and Census) come into play — enabling you to extract modeled data from the warehouse and load the data into third-party tools.
In a nutshell, unlike the CDI approach, the warehouse approach requires more time and more resources both in terms of talent and tools. And while this approach brings complete flexibility, it is only recommended in the presence of a dedicated data team that, at the very least, comprises a data engineer and a data analyst.
If you found this post helpful, do share it and help the community grow!