Hack Your Way to Data Quality with a “Metadatathon”

PayPal Product Manager Vaidehi Sridhar probably wouldn’t call herself a mastermind, but she did execute on a genius idea to improve data quality across PayPal’s sprawling ecosystem.

Even better, Sridhar and her team laid the groundwork for PayPal to federate data governance for its decentralized teams—as well as automate many types of tedious governance tasks.

The genius idea? A metadata hackathon, or “Metadatathon.”

A Data Quality Catch-22


Most organizations, especially large enterprises, recognize that a metadata platform is a foundational element of any advanced data ecosystem. To fill this requirement, PayPal selected Acryl Cloud, the fully managed SaaS offering based on DataHub, the open-source modern data catalog and metadata platform.

There was one slight problem. Like all large companies, a small share of PayPal’s data assets lacked useful documentation and context. And without rich, descriptive metadata and documentation, even the most powerful data catalog and metadata platform is starved for information—kind of like Sherlock Holmes attempting to solve a mystery…without knowing the facts of a case.

Moving the Needle on Metadata


Sridhar saw PayPal’s Metadatathon as a way to (wait for it) hack away at this deficiency. And it worked!

In just 15 days, PayPal’s teams documented and added metadata to almost 3,000 datasets and nearly 90,000 columns—a 20% increase in documentation and metadata.

The Metadatathon also helped acquaint teams with the power of Acryl Cloud, driving a 300% increase in active monthly users (from 400 to 1,200) in just two months.

By any criteria, it was a runaway success.

So how did Sridhar and her team pull this off? Let’s find out!

Metadatathon Organizing 101

The success of PayPal’s Metadatathon owes everything to the hard work of Sridhar and her team, who had to figure out the logistics of hosting and scaling a distributed hackathon across 20 internal teams.

At a minimum, this required streamlining data access for hundreds of self-serving users. But Sridhar’s team also had to establish clear criteria PayPal could use to determine a winner of the Metadatathon.

Because what’s a hackathon without a winner?

Here’s how Sridhar and her team went about it.

1. Strong top-down support and commitment from PayPal’s leadership.

Like it or not, initiatives of this kind always require strong, visionary backing from leaders. In PayPal’s case, leadership recognized that the documentation gap was a company-wide issue, and lent active support to the Metadatathon. This was communicated down the chain of command from the top, so team members would clearly understand the value and necessity of participating.

“Our leadership team was a great support because they understood that this is a company-wide problem that cannot be just solved by one person or one team,” – Vaidehi Sridhar

2. Interview and gather feedback from users.

PayPal started by identifying and prioritizing its most-queried and most searched-for data assets. This was just low-hanging fruit, however.

The company wanted to hear directly from the teams that either couldn’t find the data they needed when they needed it, or couldn’t easily use it, because it lacked adequate documentation.

The goals were to, (1) better understand the search and discovery experience; (2) identify hidden or unmet user needs; and (3) formalize a set of criteria PayPal could use to improve the usability of data, starting with guidelines and standards for documenting and enriching data.

3. Set clear boundaries.

Sridhar and her team set boundaries for both the duration of the Metadatathon—15 days, from start to finish—and how success would be defined.

The goal wasn’t to document and enrich all of PayPal’s data sources and assets, but to make a measurable dent. To this end, she and her team decided what to prioritize based on feedback from users, along with their analysis of the most popular queries and most searched-for data assets.

“We narrowed down our scope to certain critical data sets for which we wanted to get documentation as phase one, because we … are imagining this crowdsourcing event to be a continuous activity,” she explains, “you cannot get everything done the first time itself.” – Vaidehi Sridhar

4. Anticipate common questions, offer real-time resources.

Sridhar’s team used information gleaned from interviews to compile a list of frequently asked questions. For example, users wanted to know if they could attach image files to documentation in Acryl Cloud. (Yes.)

“These questions … [were] easy for me to answer because Acryl had all of these [features] already available” – Vaidehi Sridhar

via its user interface, Sridhar explains. Her team also held office hours and led demos during the hackathon:

“We had multiple demos with the teams, we had office hours, and we also had help from Acryl, which was patient enough to answer all our questions.” – Vaidehi Sridhar

5. Enable access for one and all.

Teams would need to be able to collaborate in documenting and/or adding context to data assets, including both tables and columns in source databases and derived datasets.

To permit access and collaboration at this scale, PayPal leveraged Acryl Cloud’s support for Domains to temporarily organize prioritized data sources and assets into a single domain, called “Metadatathon,” that it could govern using broad-based access controls.

This enabled teams to freely document and enrich metadata without the delays that would have been introduced with standard access controls and access request/approval workflows.

6. Track and audit contributions.

PayPal used Acryl Cloud’s Timeline API to track which team members edited what—and when. Tracking at this level was essential not just for transparency and accountability, but also for determining winners based on their actual contributions.

7. Lean into experiential learning.

PayPal collected feedback from teams before, during, and after the Metadatathon event. On top of the uses described above, this feedback was also used to provide a basis for future action—from developing guidelines for maintaining documentation to improving communication and collaboration among cross-functional teams to surfacing improvements that would be incorporated into DataHub and Acryl Cloud.

“We got a lot of feedback from our users and we are working very closely with Acryl and making sure [these suggestions] are all part of their roadmap, which we will eventually see.,”- Vaidehi Sridhar

8. It’s all about community.

Participation wasn’t restricted to PayPal’s internal teams. The company sought and received help from both the DataHub community and Acryl, the vendor behind Acryl Cloud.

Prior to the Metadatathon, Acryl helped PayPal solve several thorny issues, including how to organize and facilitate access to data and how to audit contributions. Acryl experts were also on hand during the hackathon to offer live, real-time assistance.

“Without their support, it would not have been possible to accomplish this success,” – Vaidehi Sridhar

Measuring Success

There was one final step.

After the Metadatathon concluded, Sridhar tasked a team of technical experts and data stewards with reviewing the newly added documentation. Their responsibility was to ensure the accuracy and relevance of the new contributions, as well as prune incorrect or redundant entries.

This team relied on DataHub’s Timeline API to track edits and identify the most valuable contributions, which it used to evaluate the competing teams and individuals. Crucially, experts also sought feedback from users.

“After the event, our team spent a lot of time reviewing all this information and getting a sign-off from the users that what they are reading is actually good and useful for them,” Sridhar explains.

Only after reviewing both technical data and feedback from users did Sridhar’s team name the winners of the Metadatathon.

In a sense, this approach incorporated critical elements of the software development lifecycle (like quality assurance and user acceptance testing) to ensure the new contributions were both technically sound and, more importantly, useful to users.

Thorough documentation and rich metadata aren’t created in 15 days, which is why Sridhar envisions the Metadatathon as a rolling event: a periodic means, first, of chipping away at documentation debt and, second, of improving and enriching the quality of PayPal’s data.

By any conceivable metric, she says, the first Metadatathon was a huge success.

“We wanted accountability. I wanted more and more people to start embracing DataHub, with background around it,” – Vaidehi Sridhar

“One of the major objectives behind this hackathon was also to spread awareness, to start bringing more and more people to come to start using DataHub.” – Vaidehi Sridhar

Ready to Get Hacking?

Interested in organizing a Metadatathon of your own? Check out the full webinar to catch Sridhar’s full presentation!

PayPal’s Data Journey: Driving Increased Data Awareness and Governance at Scale

Recommended Next Reads