Answering Burning Questions: Where will the data be consumed?
Data collected to answer burning questions
Last week’s post covered data from external sources needed to answer a burning question.
Today’s post is a continuation of last week’s and covers some of the challenges I faced during my time at Integromat as we figured out the pecularities of the tools where data was to be consumed and acted upon.
Let’s get into it.
Where will the data be consumed?
Knowing early on where product data (first-party data) will be consumed and for what purpose aids the data collection process – it certainly does – but more importantly, it helps teams figure out the limitations and the idiosyncrasies of external destinations that need to be accounted for before data is sent to them.
Most activation tools in early 2020 didn’t support account as an entity and were unable to ingest account-level data; they only supported User as an entity and therefore, events and properties could only be associated with users (and not with accounts or workspaces).
Supporting account-level data wasn’t straightforward even for the vendors that understood its value because every business is unique and offering proper support for account-level data meant that a vendor would have to enable their customers (SaaS companies like Integromat) to mirror their application data models in the vendor’s product (activation tool) by defining the relationships between users and accounts (typically many-to-many for SaaS apps).
This is a fascinating topic and I’ll go deeper into application data models in a future post. But I wanted to bring it up now because Customer.io’s lack of support for Account as an entity (in 2020) was a significant challenge for us at Integromat.
Because it didn’t make sense to trigger emails based on a user’s aggregate activity across multiple accounts, we had to figure out a workaround to ensure that emails were only triggered based on a user’s account-level activity – the events they performed inside a unique account on Integromat.
Here’s what we did: Instead of a user_id being associated with multiple account_ids (which would have been ideal but wasn’t supported by Customer.io), each user had a distinct_id which was a concatenation of their user_id and an account_id. Therefore, users who were part of multiple accounts on Integromat had multiple (distinct) profiles on Customer.io, one for each Integromat account they belonged to. As a result, there were more user profiles on Customer.io than the number of unique users we served, directly impacting our Customer.io bill.
Also, account properties (traits of an account) had to be sent to Customer.io as user properties. This workaround wasn’t elegant but it was the best possible recourse at the time.
Today, things are very different and Customer.io now supports Account as an entity, making it easy to associate users with accounts. Moreover, support for account-level data has become table stakes for activation tools catering to SaaS companies. However, it’s worth mentioning that even in 2019, some tools were able to ingest account-level data.
Mixpanel, for instance, offered the feature as an add-on (called Group Analytics) which enabled us to sync account properties and specify the account_id as an event property (which acted as an additional identifier). As a result, we were able to go beyond user-level analyses and derive account-level insights. This was particularly important for us because it was common for one user (with a distinct user_id) to be part of multiple accounts on Integromat; therefore, it made more sense to look at a user’s activity at an account level rather than in aggregate (across multiple accounts).
Userflow, which I’d chosen after evaluating all (and testing several) onboarding tools available at the time, was just getting off the ground, and Sebastian (the founder) had offered to build support for accounts as an entity for us to send account-level data into Userflow, enabling us to trigger a product tour based on a user’s account-level activity – it was a win-win as he prioritized my needs and in exchange, I tested the product thoroughly and gave him a ton of feedback.
Do we really need the data today?
I hope that these stories make it evident that understanding the relationship between data sources (where data originates) and destinations (where data is consumed) is crucial for folks who are serious about driving data-powered growth.
Moreover, knowing where a piece of data will be consumed before it is collected has a direct impact on the data collection process – it not only helps the data team make the data available in the format supported by the respective destinations, but it also helps the growth team know beforehand what to expect when working with data in external activation tools (and come up with workarounds sooner rather than later).
Moreover, this acts as a reminder that collecting data without context is a recipe for disaster – you end up with lots of data but using the data to drive good outcomes is next to impossible (when data is collected without proper context).
Lastly, with so many considerations regarding data collection, it becomes really important to ask “Do we really need the data today?” and only if the answer is a resounding “yes” does it make sense to proceed with collection.
The series on answering burning questions is now concluded. Explore the full series here.
This was originally published on the databeats learning hub.