• Unify fragmented metadata into a central data catalog
  • Automate metadata capture and lineage via Airflow pipeline integration
  • Centralize discovery and governance of datasets across diverse systems
  • Enable safe, scalable access control decoupled from platform-specific IAM
  • Support cross-team data reuse and external sharing with clear provenance

The Topline

Challenge
Fragmented data ecosystem inherited through acquisitions slowed product development, hindered data discoverability, and created costly data duplication and access challenges

Solution
Adopted DataHub Cloud to implement a flexible, lineage-rich metadata control plane aligned to Foursquare’s vision for a scalable, developer-first data platform

Impact
Accelerated time-to-discovery and access (from days to minutes), improved developer productivity, automated metadata lineage and access control, enabled safe cross-team data reuse, and laid the groundwork for geospatial metadata innovation at scale

Note: This story was originally published January 2025.

Challenge

Foursquare’s evolution into a B2B SaaS leader in geospatial intelligence came with a heavy data burden. Through multiple acquisitions, they inherited a sprawling and fragmented ecosystem:

  • Multiple metastores
  • Diverse orchestration systems (Luigi, Airflow, homegrown tools)
  • Various compute engines and storage environments

This lack of standardization created a ripple effect across the business. Teams struggled with discovery, duplicated datasets proliferated, and inconsistent lineage slowed release cycles and hampered productivity.

“There was a lot of fragmentation. But the bigger thing that drove us to take a step back was because of a lack of good discovery. It slowed down the release cycles and it ultimately hindered developer productivity and our ability to ship things faster.”

— Vikram Gundeti, CTO, Foursquare

The scale of Foursquare’s operations only amplified the issue. With billions of place records, ad impressions, and transactions processed daily, the lack of lineage and access control became a critical bottleneck.

Solution

Foursquare reimagined their data platform around a simple but powerful question: What should the ideal developer journey look like?

The team mapped out a comprehensive developer journey that addressed each critical touchpoint in the data lifecycle:

  1. Automated data ingestion: Metadata entries are generated on ingest, ensuring immediate discoverability
  2. Centralized discovery: A self-service catalog with tags and structured properties enhances exploration
  3. Simplified data access: Unified, compute-agnostic permissions replace complex IAM configurations
  4. Flexible exploration: Developers can use their preferred tools for querying and analysis
  5. Tracked processing: Pipelines are tracked for visibility into lineage and transformations
  6. Provenance transparency: Lineage is captured automatically to show the data’s journey
  7. Data monitoring: Developers can view key metrics like freshness and volume over time
  8. Controlled sharing: Differentiates internal, cross-team, and external customer datasets
  9. Built-in governance: Admins can audit permissions and enforce compliance based on data type

Foursquare then evaluated multiple vendors to determine which solution could bring their vision of an ideal developer journey to life. They selected DataHub for its flexibility and support for non-native compute systems:

“The flexibility of plugging in different compute engines and other things were very important to us. And DataHub seemed like the only offering that provided that flexibility.”

— Vikram Gundeti, CTO, Foursquare

Foursquare aligned their teams on Airflow as the orchestration backbone, partnering with Astronomer to simplify pipeline development and reduce the burden of hosting. By tightly integrating DataHub with Airflow, they ensured seamless, source-aligned metadata capture that accelerated developer adoption of the control plane.

Ask yourself: What is the first customer experience or the developer experience you would want to fix? And how do you do that in a sustainable way? That’s what led to our adoption of DataHub. And that’s what I would recommend anyone doing.

VIKRAM GUNDETI

CTO, Foursquare

Impact

With DataHub Cloud, Foursquare turned a patchwork of data tools into a coherent, lineage-driven control plane.

Key outcomes included:

  • Improved data user productivity by reducing time-to-discovery and access from days to minutes
  • Streamlined developer journey from ingestion to discovery to reuse
  • Automated metadata capture embedded at the source of truth
  • Established fine-grained access control for sensitive data without over-reliance on AWS IAM
  • Improved visibility into upstream/downstream dependencies

Start your own success story with DataHub

Meet with us

See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.

Join our open source community

Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.