• Airflow logo

    Airflow

    Airflow is an open-source data orchestration tool used for scheduling, monitoring, and managing complex data pipelines.

  • Apache Hudi Logo

    Apache Hudi

    Apache Hudi is an open-source data lake framework that provides ACID transactions, efficient upserts, time travel queries, and incremental data processing for large-scale datasets.

  • Athena logo

    Athena

    Athena is a serverless interactive query service that enables users to analyze data in Amazon S3 using standard SQL.

  • Azure AD logo

    Azure AD

    Azure AD is a cloud-based identity and access management tool that provides secure authentication and authorization for users and applications.

  • BigQuery logo

    BigQuery

    BigQuery is a cloud-based data warehousing and analytics tool that allows users to store, query, and analyze large datasets quickly and efficiently.

  • DataHub logo in color

    Business Glossary

    A source provided by DataHub for ingesting glossary metadata that provides a comprehensive list of business terms and definitions used within an organization.

  • Clickhouse logo

    ClickHouse

    ClickHouse is an open-source column-oriented database management system designed for high-performance data processing and analytics.

  • DataHub logo in color

    CSV

    An ingestion source for enriching metadata provided in CSV format provided by DataHub

  • Dagster logo

    Dagster

    Dagster is a next-generation open source orchestration platform for the development, production, and observation of data assets..

  • Databricks logo

    Databricks

    Databricks is a cloud-based data processing and analytics platform that enables data scientists and engineers to collaborate and build data-driven applications.

  • Acryl Data logo light

    DataHub

    Integrate your open source DataHub instance with DataHub Cloud or other on-prem DataHub instances

  • DBT Logo

    dbt

    dbt is a data transformation tool that enables analysts and engineers to transform data in their warehouses through a modular, SQL-based approach.

  • Delta Lake logo large

    Delta Lake

    Delta Lake is an open-source data lake storage layer that provides ACID transactions, schema enforcement, and data versioning for big data workloads.

  • DataHub logo in color

    Demo Data

    Demo Data is a data tool that provides sample data sets for demonstration and testing purposes.

  • Druid logo

    Druid

    Druid is an open-source data store designed for real-time analytics on large datasets.

  • Elastic Search logo

    Elasticsearch

    Elasticsearch is a distributed, open-source search and analytics engine designed for handling large volumes of data.

  • Feast logo

    Feast

    Feast is an open-source feature store that enables teams to manage, store, and discover features for machine learning applications.

  • DataHub logo in color

    File

    An ingestion source for single files provided by DataHub

  • DataHub logo in color

    File Based Lineage

    File Based Lineage is a data tool that tracks the lineage of data files and their dependencies.

  • Glue logo

    Glue

    Glue is a data integration service that allows users to extract, transform, and load data from various sources into a data warehouse.

  • Great Expectations logo

    Great Expectations

    Great Expectations is an open-source data validation and testing tool that helps data teams maintain data quality and integrity.

  • Hive logo

    Hive

    Hive is a data warehousing tool that facilitates querying and managing large datasets stored in Hadoop Distributed File System (HDFS).

  • Hive logo

    Hive Metastore

    Hive Metastore (HMS) is a service that stores metadata that is related to Hive, Presto, Trino and other services in a backend Relational Database Management System (RDBMS)

  • Iceberg logo

    Iceberg

    Iceberg is a data tool that allows users to manage and query large-scale data sets using a distributed architecture.

  • DataHub logo in color

    JSON Schemas

    JSON Schemas is a data tool used to define the structure, format, and validation rules for JSON data.

  • Kafka logo gray

    Kafka

    Kafka is a distributed streaming platform that allows for the processing and storage of large amounts of data in real-time.

  • Kafka logo gray

    Kafka Connect

    Kafka Connect is an open-source data integration tool that enables the transfer of data between Apache Kafka and other data systems.

  • DataHub logo in color

    LDAP

    LDAP (Lightweight Directory Access Protocol) is a data tool used for accessing and managing distributed directory information services over an IP network.

  • Looker web logo

    Looker

    Looker is a business intelligence and data analytics platform that allows users to explore, analyze, and share data insights in real-time.

  • Maria logo

    MariaDB

    MariaDB is an open-source relational database management system that is a fork of MySQL.

  • metabase logo

    Metabase

    Metabase is an open-source business intelligence and data visualization tool that allows users to easily query and visualize their data.

  • mssql logo

    Microsoft SQL Server

    Microsoft SQL Server is a relational database management system designed to store, manage, and retrieve data efficiently and securely.

  • microsoft teams logo

    Microsoft Teams

    Send notifications to Teams channels on updates to entities in DataHub.

  • ML Flow logo

    MLflow

    MLflow is an open-source platform for managing the end-to-end machine learning lifecycle.

  • mode logo

    Mode

    Mode is a cloud-based data analysis and visualization platform that enables businesses to explore, analyze, and share data in a collaborative environment.

  • mongodb logo

    MongoDB

    MongoDB is a NoSQL database that stores data in flexible, JSON-like documents, making it easy to store and retrieve data for modern applications.

  • mysql logo

    MySQL

    MySQL is an open-source relational database management system that allows users to store, organize, and retrieve data efficiently.

  • nifi logo

    NiFi

    NiFi is a data integration tool that allows users to automate the flow of data between systems and applications.

  • Okta logo

    Okta

    Okta is a cloud-based identity and access management tool that enables secure and seamless access to applications and data across multiple devices and platforms.

  • Openapi logo

    OpenAPI

    OpenAPI is a specification for building and documenting RESTful APIs.

  • oracle logo

    Oracle

    Oracle is a relational database management system that provides a comprehensive and integrated platform for managing and analyzing large amounts of data.

  • postgres logo

    Postgres

    Postgres is an open-source relational database management system that provides a powerful tool for storing, managing, and analyzing large amounts of data.

  • powerbi logo

    PowerBI

    PowerBI is a business analytics service by Microsoft that provides interactive visualizations and business intelligence capabilities with an interface simple enough for end users to create their own reports and dashboards.

  • Prefect logo

    Prefect

    Prefect is a modern workflow orchestration for data and ML engineers.

  • presto logo

    Presto

    Presto is an open-source distributed SQL query engine designed for fast and interactive analytics on large-scale data sets.

  • protobuf logo

    Protobuf Schemas

    Protobuf Schemas is a data tool used for defining and serializing structured data in a compact and efficient manner.

  • pulsar logo

    Pulsar

    Pulsar is a real-time data processing and messaging platform that enables high-performance data streaming and processing.

  • redash logo

    Redash

    Redash is a data visualization and collaboration platform that allows users to connect and query multiple data sources and create interactive dashboards and visualizations.

  • redshift logo

    Redshift

    Redshift is a cloud-based data warehousing tool that allows users to store and analyze large amounts of data in a scalable and cost-effective manner.

  • s3 logo

    S3 Data Lake

    S3 Data Lake is a cloud-based data storage and management tool that allows users to store, manage, and analyze large amounts of data in a scalable and cost-effective manner.

  • Sage Maker logo

    SageMaker

    SageMaker is a data tool that provides a fully-managed platform for building, training, and deploying machine learning models at scale.

  • Salesforce logo

    Salesforce

    Salesforce is a cloud-based customer relationship management (CRM) platform that helps businesses manage their sales, marketing, and customer service activities.

  • SAP HANA logo

    SAP HANA

    SAP HANA is an in-memory data platform that enables businesses to process large volumes of data in real-time.

  • Slack logo

    Slack

    Send notifications to Slack channels on updates to entities in DataHub.

  • Snowflake logo

    Snowflake

    Snowflake is a cloud-based data warehousing platform that allows users to store, manage, and analyze large amounts of structured and semi-structured data.

  • Spark logo

    Spark

    Spark is a data processing tool that enables fast and efficient processing of large-scale data sets using distributed computing.

  • SQL Alchemy logo

    SQLAlchemy

    SQLAlchemy is a Python-based data tool that provides a set of high-level API for connecting to relational databases and performing SQL operations.

  • Superset logo

    Superset

    Superset is an open-source data exploration and visualization platform that allows users to create interactive dashboards and perform ad-hoc analysis on various data sources.

  • Tableau software logo

    Tableau

    Tableau is a data visualization and business intelligence tool that helps users analyze and present data in a visually appealing and interactive way.

  • Teradata logo

    Teradata

    Teradata is a data warehousing and analytics tool that allows users to store, manage, and analyze large amounts of data in a scalable and cost-effective manner.

  • Trino logo

    Trino

    Trino is an open-source distributed SQL query engine designed to query large-scale data processing systems, including Hadoop, Cassandra, and relational databases.

  • Vertica logo

    Vertica

    Vertica is a high-performance, column-oriented, relational database management system designed for large-scale data warehousing and