Using DataHub for ML Development

Using DataHub for ML Development

In the past decade, machine learning has supercharged our ability to create useful insights from data. Alongside that, machine learning lets us use our data to build intelligent, automated tooling. Whether it’s building tooling to identify fraudulent transactions or creating large language models to power NLP tools, DataHub gives you the flexibility to accelerate your workflow.

Machine Learning Lead

A machine learning lead needs to manage everything in the ML stack, from the training data to the models. Throughout the stack, a lead would need to verify that the right data is used for each task and that the data is reliable and high-quality. DataHub simplifies many ML-related workflows by easily exposing relevant metadata about datasets and how they are connected.

How can I ensure my ML features exclude PII data?

DataHub lets us easily view whether datasets include Personal Identifiable Information (PII) data. The feature table view in DataHub allows us to see what features are included in a dataset so we know whether they contain PII data. On top of that, we can even link glossary terms to the dataset to indicate it contains PII data. Filtering out this data lets us ensure that the models we build comply with data privacy policies.

Sensitive Classification
Test engagement

Are my features built on reliable sources?

If you’d like to check whether your features are built on reliable sources, DataHub’s lineage tracking capability makes it simple to check this. After helping you locate the relevant part of the data stack, DataHub allows you to view the entire lineage of data down to the original source. This lets you view the original data sources that a feature is built on and check whether each one is updated and reliable.

Engagement lineage

Acryl Data and the DataHub community are adding even more features over time to magnify the positive impact that your data can have. So, we’d love you to be part of the DataHub community! Want to get involved? Come say hello in our Slackcheck out our Github, and watch a recording of our June Town Hall to learn about the latest in DataHub.

Similar Posts