Composable CDP vs. Packaged CDP: An Unbiased Guide Explaining the Two Solutions In Detail
It's time to talk about what makes a CDP a CDP
Produced in collaboration with Glenn Vanderlinden (Co-founder of Human37), this guide is part one of the series titled CDP Beats.
The CDP â such a freaking beast, isnât it?
I think itâs a little bit like Hydra in Greek mythology â the water monster that would grow two heads every time one of its heads would be chopped off.
Every attempt to kill the CDP has made it stronger, has more people talking about it, and has more vendors claiming that they are, in fact, a CDP in disguise â the CDP is officially antifragile.Â
Iâve personally been fascinated by the CDP. Over the last 3 years, Iâve spent a ridiculous amount of time writing about the CDP and keeping tabs on its evolution from packaged to composable. If youâve followed the composable CDP vs packaged CDP chatter, youâve surely heard both sides of the argument and donât need another opinion piece explaining why one approach is better than the other.Â
I believe itâs time for an unbiased guide that offers a complete breakdown of the CDP into its components, which like Hydraâs heads, keep increasing in number.Â
This guide aims to help people make CDP buying decisions based on a clear understanding of the various components of a CDP, the purpose of each component, and which components are required to find the most efficient path to putting data to work before it becomes stale or unusable.
The one thing that we wonât get into is cost because cost is very subjective. Most comparisons of the composable vs packaged approach focus on the licensing cost of the software, leaving aside other line items that need to be considered irrespective of the approach â people cost, opportunity cost (of a slow or poor implementation), or the cost of data decay.
Iâd like to begin by getting definitions out of the way.
CDP Definition
The rise of the data warehouse led to the emergence of reverse ETL in late 2020, followed by the notion that a combination of these two technologies has made it viable for companies to build â or more accurately, assemble â a Customer Data Platform on top of the data warehouse.Â
This is how the idea of a composable CDP emerged in early 2021 and gained momentum in 2022.Â
But what exactly is a composable CDP? Is it an architecture? Is it an approach? Is it a set of integrated tools? Or is it a productized solution like a packaged CDP?
If you Google âComposable CDPâ, youâll find that none of the articles offers a concise definition of this term.Â
Letâs change that.
Firstly, what is a Packaged CDP?
A Packaged Customer Data Platform (CDP) is an all-in-one productized solution with capabilities to collect and store data from multiple sources, transform and unify the data, resolve identities, build audiences, and sync data to downstream destinations. Additionally, some packaged CDPs also offer tools to define data quality rules, implement data governance protocols, and comply with privacy regulations.
There are two key considerations here:
A Packaged CDP needs to store a copy of the data it collects in order to resolve identities (ID resolution) and build unified user profiles. However, the ID resolution methodology used â probabilistic or deterministic â varies from vendor to vendor.
A Packaged CDP vendor usually allows companies to build their own packages by combining core capabilities and add-on tools.
What is a Composable CDP then?
A Composable Customer Data Platform (CDP) is a set of integrated tools that are assembled using open-source or proprietary software to perform some or all functions of a Packaged CDP.
There are two key considerations here:
A Composable CDP has some or all capabilities of a Packaged CDP, depending on how it is composed or assembled
A Composable CDP is assembled using open-source software, managed solutions of open-source software, or proprietary SaaS tools
Now that the definitions are out of the way, letâs dig deeper into the various components that a CDP comprises.Â
CDP Components
One of the key challenges with the term âCustomer Data Platformâ is that it has been used and misused by a variety of software vendors in a variety of different contexts. Many vendors have even positioned a product feature as a CDP, just because that feature allows users to manage customer data that has been ingested into that product.Â
Iâd like to list down a couple of caveats before offering a thorough rundown of each CDP component:
Not every Packaged CDP vendor offers all of these components
Several established CDP vendors offer additional capabilities or components
Within each component, the specific capabilities might differ from vendor to vendor
You donât necessarily need all of these components to compose a CDP
Letâs get into it.
1. Behavioral Data Collection: Customer Data Infrastructure or CDI
A CDI is a purpose-built tool that offers a set of SDKs to collect behavioral data or event data from first-party data sources.Â
Your core product â web apps, mobile apps, smart devices, or a combination â powered by proprietary code is a first-party data source, and behavioral data helps understand how your product is used and identify points of friction.
This data is a prerequisite for a CDP and without this data, a CDP is, well, not a CDP.
Behavioral data from your first-party data sources serves as the foundation for a CDP.
There are two key considerations here:
The CDI capability of a Packaged CDP is able to sync data directly to third-party tools downstream, without the need to store a copy of the data in your own data warehouse
Standalone CDIs support the data warehouse as the primary destination and as compared to the CDI component of packaged CDPs, standalone CDIs (such as Snowplow) offer fewer third-party destination integrations
To know more about CDI capabilities and vendors (some of which are part of larger CDP offerings), here you go.
P.S. While I have been a huge proponent of the term CDI, in retrospect, I believe âCustomerâ should be replaced with âAudienceâ since the data thatâs collected isnât just about customers â in fact, data collection is initiated long before a user or organization becomes a customer. If Audience over Customer resonates, you'll enjoy reading this post.
2. Data Ingestion: ELT (or ETL)
A standalone ELT/ETL solution is purpose-built to extract all types of data from a growing catalog of secondary data sources (third-party tools) and load the data into cloud data warehouses.Â
Secondary data sources include third-party tools that users interact with directly or indirectly â tools used for authentication, payments, in-app experiences, support, feedback, engagement, and advertising.
There are two key considerations here:
A Packaged CDP that offers ELT capabilities â source integrations with third-party tools â first ingests the data in its own data store, and can additionally sync the data to a data warehouse via destination integrations.Â
The ELT capabilities of Packaged CDP vendors are very limited in comparison to purpose-built ELT solutions. If you need to data into a CDP from a source not natively supported by the CDP vendor, youâd have to build your own pipeline or use an ELT tool to send the data to a warehouse and then sync it back to the CDP using the source integrations offered by CDP vendors.
If youâd like to explore the offerings of popular ELT vendors, here you go.
3. Data Storage/Warehousing
As already mentioned, Packaged CDP vendors store a copy of the data they collect in an internal data store or warehouse. Customers can additionally send a copy of the data to their own data warehouse or data lake via destination integrations.Â
The data warehouse, as you already know, is the core component of a Composable CDP â the centerpiece to which all other components connect to.Â
There are two key considerations here:
The data warehouse has historically been used to store relational data from third-party tools and visualize that data using a BI tool. Therefore, to assemble a Composable CDP, even companies that already have a warehouse in place need to ingest behavioral data from their first-party sources using a CDI.Â
A Packaged CDP can be used alongside a data warehouse. In fact, itâs becoming increasingly common for customers of packaged CDPs to store a copy of their data in their own warehouse for future use. Additionally, companies are embracing a hybrid approach where they leverage a Packaged CDPâs out-of-the-box capabilities for certain use cases while also assembling a Composable CDP for advanced use cases that rely on custom data models.
4. Identity Resolution and Profile APIÂ
Identity resolution is the process of unifying user records captured across multiple sources. It requires a set of identifiers (IDs) that are used to match and merge user records originating across sources, allowing businesses to get a comprehensive view of each user or customer.Â
Identity resolution has several use cases but it primarily helps with personalization and privacy efforts.
There are two key considerations here:
A Packaged CDP offers out-of-the-box identity resolution capability and builds unified user profiles. CDP customers can then sync these unified profiles to a data warehouse or to third-party tools using the available APIs. Also, as mentioned early on, a CDP vendor uses either the probabilistic or the deterministic methodology to resolve identities.Â
In the composable approach, companies have to manage identity resolution in their own data warehouse by writing the unification code using SQL. Due to the flexibility afforded by this approach, the analyst can use whatever ID resolution methodology that works best based on the available data points.
5. Visual Audience Builder (and Data Modeling)
Another prerequisite for a CDP, a visual audience builder is precisely what it sounds like â a drag-and-drop interface to build audiences or segments by combining data from various sources.
Under the composable approach, this capability is offered by Reverse ETL tools, now being referred to as Data Activation tools.Â
There are two key considerations here:
A Packaged CDP automatically creates the underlying data models on top of the data it stores, allowing non-data teams to build audiences without any dependencies. However, these models are rigid and customers cannot build custom models as per their specific business needs.
A Reverse ETL/Data Activation tool requires data teams to build and expose data models (using SQL) on top of the data thatâs in the warehouse to further enable non-data teams to build audiences using the visual audience builder. This approach gives businesses complete flexibility over their models and the ability to incorporate custom entities.
P.S. I believe there needs to be a better term to describe this category of tools since Reverse ETL is just a feature and Data Activation is a use case that can be also fulfilled using a Packaged CDP.Â
6. Reverse ETLÂ
As you already know, Reverse ETL refers to the process of moving data from the data warehouse to downstream destinations â typically third-party tools but can also be an internal database.Â
Companies have been building Reverse ETL pipelines for a while; however, the usage of the term âReverse ETLâ picked up only after the productization of Reverse ETL in early 2020 (I first heard the term in August 2020 from Boris Jabes, the founder of Census).
Itâs 2023 and now Reverse ETL is a feature or component of the CDP.Â
There are two key considerations here:
A Packaged CDPâs capability to move data to downstream destinations, often referred to as orchestration, is essentially Reverse ETL where data is moved from the CDPâs own data warehouse instead of the customerâs warehouse. Today, most Packaged CDPs also support the customerâs data warehouse as a data source.Â
In the composable approach, companies that like to build everything in-house can build their own pipelines, or leverage Packaged Reverse ETL that is offered by Data Activation tools (like Census or Hightouch) as well as some CDIs (like RudderStack).
7. Data QualityÂ
An underrated albeit important component, Data Quality (DQ) helps companies ensure that the data powering their CDPs is not funky. DQ tools help companies maintain the validity, accuracy, consistency, freshness, and completeness of data â amongst other things.Â
Data Quality is a very wide category with a plethora of tools to find issues and maintain the quality of different types of data. However, behavioral data is the foundation of a CDP where one needs tools to ensure that the data is valid, accurate, and fresh.
There are two key considerations here:
A Packaged CDP typically offers data quality features to run tests against the behavioral data that it collects. It also offers the ability for teams to collaboratively build tracking plans.
In the composable approach, the DQ component can either come from the CDI tool or a separate DQ solution (like Great Expectations) that can at the very least, validate the incoming data.Â
8. Data Governance and Privacy Compliance
Another extremely important yet underrepresented component of a CDP is the ability to set up governance checks and compliance workflows.Â
Itâs fair to say that this is something that businesses need anyway, irrespective of whether they use a CDP or not. However, if a business does use a CDP â whether packaged or composed â they need to ensure a few things such as:
Data collection is initiated only after a user has provided consent for data to be collected for specific purposes such as marketing or analyticsÂ
Only the data thatâs needed in a third-party tool is sent to that specific destination. For example, PII such as email address is sent to a third-party tool only after the end user has provided explicit consent to receive emails that are sent using that third-party tool
If a user opts out of data collection, no further data about that user should be collected across first-party and third-party sources.Â
If a user wishes to be forgotten (GDPR) or wants to opt out of their data being sold (CCPA), erasure requests must be sent to the third-party tools downstream where their data was sent earlier
Internal team members should be able to access sensitive data or PII only if thereâs a need for them to access that data, with granular role-based permissions
These are just some of the key capabilities of the Governance and Compliance component of a CDP, and as you can tell, itâs not trivial to build this in-house.Â
There are two key considerations here:
The Governance and Compliance capabilities of Packaged CDPs vary significantly and only the leading CDP vendors offer comprehensive toolkits.
In the composable approach, one can leverage some of these capabilities offered by some of the CDI vendors or integrate standalone purpose-built tools for Governance and Compliance.
Conclusion
I sincerely hope that you now have a better understanding of what makes a Packaged CDP different from a Composable CDP and which approach is better to serve your organizationâs needs.Â
If you decide to assemble a Composable CDP, you definitely need a capable data team that can stitch all the requisite components together which can indeed be a lot of work â is there a business opportunity here? I think so.
Like it or not, the CDP is a beast and like Hydra, this beast continues to grow heads. We havenât even touched upon more recent developments that will slowly but surely find ways to conspire with the beast â things like streaming data infrastructure, zero-party data, and of course, AI.Â
In part 2 of this series, Glenn and I discuss why the conversation about Composable vs. Packaged CDP is about culture and philosophy as much as itâs about technology â check it out!
First-party Data Series
If youâd like to dig deeper into all things first-party data, we got you!
I am binge-reading/studying these post of yours. They are clear and spot on, the perfect jumpstart for me to build a rock solid knowledge of CDPs and their capabilities. Good job there, congrats to you and the Human37 team!