Now that you have successfully deployed DataHub in your organization, it’s time to make the most of the platform by rolling it out to your stakeholders.
We know this can be a daunting task, so we reached out to members of the DataHub Community to hear how other folks have successfully introduced the tool within their companies.
Read on to learn 5 concrete steps you can take to launch DataHub within your organization.
#1: Educate (& Deprecate) Around Data Catalogs
If DataHub is your first data catalog, take the time to educate your stakeholders about catalogs and their utility in the data stack. Help people understand the problems DataHub is meant to address and how you envision it fitting into their day-to-day workflows. The DataHub Blog is a great resource to help you get started, particularly:
If you have a previous data catalog, create a deprecation plan:
Create and forward-communicate hard deadlines for eventual deprecation with multiple stages where you’ll:
Stop adding new users
Disable UI
Disable API
Fully deprecate
Pair the deprecation with other onboarding techniques (champions, email campaigns, persona targeting). Make use of banners in your old tool & Slack announcements to increase awareness around the deprecation.
For example, the Data Discovery Team at one of our partners established both hard and soft deprecation deadlines:
Soft: If a user visited the old tool, it redirected them to DataHub. However, they were able to bypass the redirect if they were determined.
Hard: Their data team replaced the core functionality on the backend of their old catalog with DataHub, and tore down the previous infrastructure, disabling functionality from the old tool from that date on.
#2: Enlist Champions & Shift Left
Early in your rollout, take the time to identify and partner with highly-motivated stakeholders in your organization to serve as champions, and team up with them to address their common pain points via DataHub.
Sample Use-Cases for Different Champion Personas
Aren’t sure which stakeholders to engage? Look for members of your organization that have recurring workflows and/or responsibilities that could be improved by adopting DataHub.
For example, Data Engineers regularly change/update schemas that may have unintended consequences on downstream dependencies. By leveraging DataHub’s Impact Analysis feature, they can start to proactively communicate breaking changes to downstream data consumers.
Here’s a breakdown of common use cases and personas to consider when you’re looking for DataHub champions:
Common use cases to target when searching for DataHub champions
Once you have identified your DataHub Champions and the targeted use cases to solve with DataHub, schedule 1 on 1 time for tests and progress checks to ensure they are empowered to get the most from the tool. Draw from their journey for examples, “aha! moments”, and key learnings you can broadcast to your wider audience, and partner with them to onboard their immediate teammates & teams to replicate the workflows.
For example, here’s how Tim Bossenmaier, Data Engineer at inovex, described the two key personas they targeted with their rollout:
Whether they are business or data analysts or data scientists, we want to provide everyone who works with data with all the information they need in DataHub. For this reason, we pay special attention to the correctness of dataset schemas, the provision of schema descriptions, and correct lineage.
Analysts
Anyone who consumes data in a downstream way, mainly as report dashboards, etc. For them, it is very important that all KPIs are clearly defined in the glossary and linked to the appropriate entities in DataHub.
Data Stakeholders
Tim’s team shared learnings and demonstrated DataHub’s features to a select group of these users via weekly sprint reviews during their rollout.
Shift Left: Capture Metadata at its Source
Meet your DataHub Champions where they are. It’s highly likely that they are already capturing documentation, annotation, quality tests, and more in their existing tools and workflows.
Whenever possible, “Shift Left” by capturing this rich context at its source and sending it to DataHub. This will remove points of friction for adoption and will empower your Champions to focus on generating high-quality metadata in the tools and environments they already use. You can learn more about the power Shift Left here.
Establish a designated Slack/Teams channel for the rollout where you can post announcements and troubleshoot issues. Announce and link to the channel in the relevant company and team-wide channels.
Create a regular email campaign where you inform users of the state of the rollout and drive adoption with hooks that draw people in:
Link out to interesting & relevant discoveries in DataHub
Communicate timelines for the rollout & deprecations
Include materials from our blog and Youtube channel, or make your own to help users understand DataHub’s usefulness in the specific context of your org.
Speak to personas & value adds with featured quotes from your champions.
Be sure to link out to DataHub at every opportunity, on every surface you can find:
GitHub READMEs/PRs,
Slack/Teams,
emails,
app banners,
PagerDuty notifications,
birthday cards,
memes.
OK, so maybe not memes. But everything else!
DataHub’s job in your organization is to provide helpful context and visibility; the wider you broadcast links, the better a job it will do, and the more people will understand what it’s for and what they can do with it.
#4: Regular Onboarding Workshops & Office Hours
Schedule regular onboarding workshops and announce them in email updates and common channels. Prioritize making users’ lives easier, and ensure they walk away with net new knowledge. You can lean on your champions to find a story that will stick with users.
One of our partners found that the governance conversation was especially compelling, and DataHub was the perfect solution to this common problem.
Schedule regular meetings and broadcast that you are available for troubleshooting. Target presentations and Demos for engineering/learning weeks and internal meetups to increase awareness of the rollout.
A sample week in our proposed onboarding program.
#5: Defining Success: Establish Goalposts, Owners, and KPIs
Before you start onboarding users, agree on what success looks like for your rollout by creating goalposts:
Establish success metrics and a future timeline of their expected state.
Assign owners that are accountable for these metrics as KPIs.
Create goalposts both for your stakeholders and for the team responsible for onboarding.
Always keep in mind: What is the key goal DataHub will hep you accomplish?
An example of onboarding team goalposts:
Champions identified & onboarded for each team by X date.
20 WAUs by X date, 40 by Y date, and 100 by Z date.
5 onboarding workshops held by X date with total 100 attendees.
90% of old catalog’s traffic moved to DataHub by 2 weeks before the deprecation.
An example of stakeholder/end-user goalposts to set and encourage governance expectations:
60% of assets have ownership by X date
Glossary terms added to all domains by X date.
All lineage populated for the Data Platform Team by end of the quarter.
In order for people to feel ownership of entities in DataHub, the introduction of dedicated roles can be helpful. Tim shared his expectations of a “data steward” in DataHub as an example:
“Since we don’t want to manage all this data centrally, we have introduced the role of data stewards. We plan to have one data steward per area, who will then be responsible for keeping the KPI definitions in the glossary up to date and contacting teams when data appears to be corrupted or a KPI appears to be miscalculated.
Data stewards have special permissions that distinguish them from regular users. This is also the user group we are currently focusing on the most and for whom we are offering the introductory workshop sessions. We hope they will help us spread and establish DataHub throughout the organization.”
Have one team take point
A key point of success for one of our partners was its Data Discovery Team, which is the sole team responsible for discovering and ingesting the company’s data stack into DataHub. The team worked with technical stakeholders to identify new ingestion sources to bring into DataHub and owned developing any custom functionality that was required to onboard a new team.
Their Data Discovery team also started utilizing Great Expectations beyond the DataHub’s out-of-the-box integration. They leverage the built-in Great Expectations features in DataHub to profile the data, and additionally, they have rolled out Great Expectations as a stand-alone tool to drive data quality across the organization. Having both of these efforts under the same team will make it easier to converge in the future.
With one team running point, it’s easy to create KPIs around what portion of you’re org’s data stack is represented in your catalog. This creates incentives towards quick integration which don’t exist when the responsibility of ingestion falls to owners who don’t yet see the catalog as part of their daily workflow.
Bonus: Common Pitfalls
Don’t focus on too small a use-case: it’s a data party, and everyone’s invited.
Don’t start with too little data: ingest everything you can find that adds value.
Don’t just grab anything: review the sources you’re ingesting to prevent friction from poorly curated or blank datasets.
Don’t wait until the metadata is perfect: encourage users to fill in gaps and take ownership.
This post, part of our DataHub Blog Community Contributor Program, is written by Divya Manohar, DataHub Community member and Data Platform Engineer at Stripe About Stripe and the Stripe Data Platform team Stripe is a payment services provider that enables merchants to accept credit and debit card payments. Our products power payments for online and in-person retailers,…
Photo by Ricardo Gomez Angel on Unsplash Background Saxo Bank connects clients to investment opportunities in global capital markets, delivering user-friendly and personalized multi-asset trading and investment tools to private clients and open banking solutions to wholesale clients. Saxo Bank’s founding ethos is to democratize trading and investment. Effective use of data is important for Saxo…
This week, we had the pleasure of speaking with Mike Linthe, Managing Director of Contiamo. Contiamo finds elegant solutions for complex challenges every day, focusing on scalable cloud applications and the intensive use of open-source tools. Mike shares his journey with DataHub, how Contiamo is leveraging DataHub for successful client outcomes, his favorite features, and more! This…
Data Monitoring and Management Gets Easier with Subscriptions and Notifications in Acryl DataHub Acryl DataHub’s Subscriptions and Notifications feature is designed to give you real-time alerts on data assets of your choice – so you stay informed and be proactive in managing your data resources. Whether you’re a data analyst, a team lead, or a…
🥂 Happy 2022, DataHub Enthusiasts! We’ve started off the year with high-impact improvements to user and developer experience; let’s get you caught up on what you may have missed in recent releases. Lineage Visualization Update: Show Full Entity Names We know that sometimes entity names can get very looong, making it tough to interpret the…
In the past decade, machine learning has supercharged our ability to create useful insights from data. Alongside that, machine learning lets us use our data to build intelligent, automated tooling. Whether it’s building tooling to identify fraudulent transactions or creating large language models to power NLP tools, DataHub gives you the flexibility to accelerate your workflow. Machine…