Skip to content
DataHub
Get a Demo
Product Overview

Product Overview

AI-powered discovery, governance, and observability unify across your data estate to deliver data quality, compliance, and AI readiness.

Learn more

Platform

  • Discovery
  • Observability
  • Governance
  • Lineage
  • AI
  • Context Management New

Explore

  • The ROI of DataHub Cloud
  • DataHub Cloud vs Core
  • Integrations
  • Product Demos
Join the Community

Join the Community

Get help, share ideas, and connect with the DataHub community on Slack.

Learn more

Engage

  • Join the Community
  • Town Halls
  • Docs
  • Champions

Connect

  • Slack
  • Youtube
  • Office Hours
Pinterest Powers its #1 AI Agent with DataHub Context

Pinterest Powers its #1 AI Agent with DataHub Context

Modern data discovery goes beyond search. Learn how DataHub connects your data estate end-to-end.

Learn more
Resources
  • Blog
  • Guides
  • Events
  • Customer Stories
  • Webinars

Support

  • Docs
  • Get Support
  • Live Group Demo
Context Management for Enterprise AI

Context Management for Enterprise AI

The complete resource hub for context management: foundational concepts, architecture guides, implementation patterns, and comparisons.

Learn More

Hubs

  • Context Management
  • Data Lineage Coming Soon
Careers

Careers

Data is powering AI. But without context, even the best models fall short. Join us.

Learn more

Company

  • About us
  • Careers
  • News
Partners
  • AWS
  • Google Cloud
  • Snowflake
  • Databricks
DataHub
  • Platform

    • Discovery
    • Observability
    • Governance
    • Lineage
    • AI
    • Context Management New

    Explore

    • The ROI of DataHub Cloud
    • DataHub Cloud vs Core
    • Integrations
    • Product Demos
    Product Overview

    Product Overview

    AI-powered discovery, governance, and observability unify across your data estate to deliver data quality, compliance, and AI readiness.

    Learn more
  • Engage

    • Join the Community
    • Town Halls
    • Docs
    • Champions

    Connect

    • Slack
    • Youtube
    • Office Hours
    Join the Community

    Join the Community

    Get help, share ideas, and connect with the DataHub community on Slack.

    Learn more
  • Resources
    • Blog
    • Guides
    • Events
    • Customer Stories
    • Webinars

    Support

    • Docs
    • Get Support
    • Live Group Demo
    Pinterest Powers its #1 AI Agent with DataHub Context

    Pinterest Powers its #1 AI Agent with DataHub Context

    Modern data discovery goes beyond search. Learn how DataHub connects your data estate end-to-end.

    Learn more
  • Hubs

    • Context Management
    • Data Lineage Coming Soon
    Context Management for Enterprise AI

    Context Management for Enterprise AI

    The complete resource hub for context management: foundational concepts, architecture guides, implementation patterns, and comparisons.

    Learn More
  • Company

    • About us
    • Careers
    • News
    Partners
    • AWS
    • Google Cloud
    • Snowflake
    • Databricks
    Careers

    Careers

    Data is powering AI. But without context, even the best models fall short. Join us.

    Learn more
Get a Demo

Metadathon – Why MetaData Matters

By: Maggie Hays

04.16.24

Contents

    Online payments powerhouse PayPal had a pretty common problem.

    Its teams couldn’t find the data they needed to do their work, either because it was siloed across separate environments, wasn’t properly documented, or lacked lineage and other essential metadata.

    So Vaidehi Sridhar, product manager at PayPal, came up with a clever solution: a hackathon—but for documentation and metadata. The goal of Sridhar’s and PayPal’s “Metadatathon” was to crowdsource the labor involved in documenting and adding rich context to the company’s distributed data assets.

    “Lack of documentation was one of the major problems most of our users called out,” Sridhar explains.

    Two Sides of the Same Coin

    The Metadatathon would not only deliver a transformed data discovery experience for PayPal’s users, it would also improve data quality—while (at the same time) simplifying some of the more tedious aspects of data governance for Sridhar and her team. Anyway you looked at it, it was a win-win.

    After all, data discovery and data quality are both two sides of the same coin. You can’t discover data if it isn’t well-described—and you can’t use data unless it is both well-documented and of high quality.

    In a very real sense, then, description and documentation are integral elements of data quality.


    One Platform for Data Discovery, Observability, and Governance

    PayPal had standardized on Acryl Cloud, a fully managed SaaS offering based on the open-source DataHub project, to provide easy-to-use data discovery and collaboration capabilities for its teams.

    But on top of enabling a rich, self-serve discovery experience, Acryl Cloud also provided automated discovery and classification capabilities, best-in-class data observability features, and built-in support for data contracts and data products. Combining all of this into a single platform equipped Sridhar and her team with the capabilities they needed to monitor and maintain data quality across PayPal’s sprawling data ecosystem—as well as better understand, manage, and govern this ecosystem.

    But first, PayPal’s users needed to discover on their own what a modern data catalog and metadata platform like Acryl Cloud could do for them. “One of the major objectives behind this hackathon was also to spread awareness, to start bringing more and more people to come to start using [Acryl Cloud],” she said.

    Laying a Solid Foundation for Data Discovery

    For PayPal to achieve these goals, it would first need to enrich its data assets with meaningful and consistent documentation and metadata. This was necessary for several reasons, including:

    • “Good” documentation and rich metadata would equip data leaders and practitioners with the knowledge required to understand PayPal’s data sources, their structures, and the data management and governance processes that should apply to them.
    • Lineage metadata is especially critical, not just for regulatory compliance, problem resolution, impact analysis, and change management, but also for the work of data practitioners—especially PayPal’s data scientists—who need to understand the provenance and history of data assets, along with their dependencies on or relationships to other data assets.
    • “Good” documentation also includes information about ownership, which helps accelerate problem resolution and makes it easier to coordinate on issues of data use and governance.
    • “Good” documentation also encompasses field notes, annotations, and other collateral, including diagrams or drawings. This content aids data leaders, practitioners, and consumers in understanding, properly using, and (when necessary) modifying data assets.
    • Together, documentation and metadata provide essential information and context about the proper handling of data; its sensitivity; pertinent usage, sharing and movement restrictions; retention period (if any); and the appropriate procedures for destroying it.
    • Finally, “good” documentation and rich metadata promote both reusability and reproducibility, which are important not just for operationalizing and maintaining PayPal’s production analytics and ML/AI solutions, but for demonstrating fairness, ethical alignment, and compliance.

    An Object Lesson in Why Metadata Matters

    As Sridhar saw it, the Metadatathon would deliver other, “soft” benefits, too—like fostering community and accountability among PayPal’s cross-functional teams, while also providing an object lesson in the importance of metadata and documentation. This was all part of her master plan.

    First, she anticipated, the Metadatathon would provide an object lesson in the value of good documentation and rich metadata. Participants would get to witness this in real time, with assets they’d tagged and documented becoming discoverable in Acryl Cloud, automatically profiled and classified.

    Plus, they would receive feedback and encouragement from peers on other teams, who would be able to discover, explore, and understand their data assets. This hands-on experience would make what had been an abstract concept—“metadata management”—concrete and actionable for PayPal’s teams.

    Second, there was the practical, utilitarian aspect: like a barn-raising bee, or a Habitat for Humanity build, the Metadatathon would accomplish in days what it would take a dedicated team months—or longer—to complete. PayPal could rapidly populate Acryl Cloud with rich, contextual, metadata as teams created or improved the documentation specific to their data assets. Not only would this metadata be ingested and cataloged by Acryl Cloud, but it would also provide a firm foundation for both responsible data discovery and usage and metadata-driven data management and governance.

    Third, and arguably most importantly, there was a social or communal aspect: as with a Habitat for Humanity build, PayPal’s Metadatathon enlisted people to work together as part of a collective, concerted effort to accomplish a specific goal. Teams were empowered to take time out from their day-to-day tasks and responsibilities to work toward this common goal. This collective effort would nurture a sense of community and shared purpose, concretely demonstrate the importance of well-documented, richly contextual data assets—while also facilitating cross-functional collaboration and knowledge sharing among teams. By working together in the Metadatathon, teams would bridge silos, setting a precedent (and laying a foundation) for collaborative projects and initiatives across PayPal.

    Moving the Needle on Data Quality, Too

    There was one other dividend, too: The Metadatathon would surface data quality issues, both known and unknown. From the perspective of both self-service discoverers and governance leaders, poorly documented data sources or datasets are ipso facto poor-quality data. Unlabeled datasets are ipso facto poor-quality data. Datasets that lack detailed lineage metadata are ipso facto poor-quality data.

    But using Acryl Cloud to search for, discover, and explore datasets and other assets would also surface incomplete, inconsistent, corrupted, and stale data, too. And users would get a hand’s on feel for how discovery, data quality, documentation, and rich metadata are bound up with one another.

    Basically, data assets that lack documentation and rich metadata aren’t:

    Traceable. You can’t trace a dataset back to its origins, so you can’t understand what was done to it, by whom or what, and for what purposes. But you also can’t identify at what point it became corrupted, inconsistent, or stale. Correcting problems with datasets entails tedious, laborious reverse engineering.

    Accountable. Without clear documentation of a dataset’s ownership or custodianship, it’s difficult to determine who created it in the first place, and who is responsible for maintaining it.

    Repairable. When you don’t have information about the structure, source, lineage, or freshness of a dataset, “fixing” it is like solving a puzzle without a reference image. Much more difficult, actually.

    Reliable. The data in the dataset might actually be reliable, but you don’t know this. And it’s nearly impossible to establish its reliability without access to documentation metadata describing the circumstances of its creation, conditioning and preparation, and (not least) its purpose or intended use.

    An Exciting Start

    Metadata hacking isn’t a one-and-done thing. Sridhar anticipates holding a rolling series of Metadatathons to maintain and improve PayPal’s documentation. A modern data catalog and metadata platform like Acryl Cloud make this much easier, automatically integrating with PayPal’s data sources, automatically monitoring datasets, charts, workbooks, dashboards, and other assets.

    Curious to see DataHub in action?

    DataHub transforms enterprise metadata management with AI-powered discovery, intelligent observability, and automated governance.

    Meet with us

    See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data.

    Book a Demo DataHub Cloud

    Join our open source community

    Explore the project, contribute ideas, and connect with thousands of practitioners.

    Join the Slack community slack

    Recommended next reads

    View All Blogs
    Netflix Reimagines Discovery and Governance at Scale
    CUSTOMER STORY03.20.26

    Netflix Reimagines Discovery and Governance at Scale

    With DataHub, Netflix empowers teams to define and manage metadata through self-serve workflows, improving flexibility and governance.

    Introducing DataHub Cloud v0.3.17
    PRODUCT UPDATES03.24.26

    Introducing DataHub Cloud v0.3.17

    DataHub Cloud v0.3.17 brings native Microsoft Fabric connectors for cross-platform lineage, Ask DataHub Plugins for multi-tool context, and smarter data quality monitoring.

    The State of Context Management in 2026
    CONTEXT MANAGEMENT03.09.26

    The State of Context Management in 2026

    Survey data from 250 IT and data leaders exposes the gap between AI confidence and the context management infrastructure production-scale agentic AI demands.

    Product

    • Product Overview
    • Discovery
    • Observability
    • Governance
    • Lineage
    • AI Data Management
    • Context Management
    • The ROI of DataHub Cloud
    • Product Demos

    Community

    • Join the Community
    • Docs
    • Champions
    • Town Halls
    • Office Hours
    • Slack
    • Youtube

    Resources

    • Customer Stories
    • Blog
    • Guides
    • Articles
    • Webinars
    • Get Support

    Company

    • About Us
    • Leadership
    • News
    • Careers

    © 2026 Acryl Data, Inc.

    Privacy Policy Terms of Service Security