
How We Implemented Internationalization in DataHub
DataHub is an extensible data catalog that enables data discovery, data observability and federated governance to help tame the complexity of your data ecosystem.
In today’s era of exploding data and information production, Public Administration possesses a wealth of data from diverse sources. To perform thorough analyses that inform public agents’ decision-making, robust big data infrastructures are essential for efficient data processing.
The state of Santa Catarina exemplifies this with its BoaVista Platform, developed by CIASC (Center for Informatics and Automation of the State of Santa Catarina S.A.) as a comprehensive big data and intelligence infrastructure. In this landscape, effective metadata management is crucial and prevents knowledge loss and promotes data sharing among users.
Currently, the BoaVista Platform lacks a data catalog that can be used by end-users to discover data from other public agencies. To address this, we began searching for open-source data catalog alternatives and identified DataHub as the best option.
However, DataHub does not support internationalization, which means that all the User Interface (UI) labels and messages are in English. To solve this issue, we explored some alternatives but ultimately decided to develop this feature ourselves. As an open-source platform, we also formulated a strategy to ensure that this feature could be merged into the main project, contributing back to the community.
In this article, I will describe how this process unfolded and the strategies we developed to build this feature.
Lack of Internationalization (i18n)
Currently, the DataHub UI is only available in English. For widespread adoption, especially in a governmental context, having the UI in the native language is crucial.
After selecting DataHub as our data catalog platform, we began exploring options for implementing internationalization. However, we found no substantial information in DataHub’s roadmap or GitHub repository suggesting that this feature would be integrated into the platform anytime soon.
While there were some feature requests and initiatives related to internationalization, given our need to deploy the platform quickly, we decided to develop the internationalization feature ourselves.
Looking for ways to get involved with the DataHub community? Join our Slack Community to offer your ideas and get feedback!
Development Strategy: step-by-step
1. Corporate GitLab Repository
Our first step was to establish a dedicated repository in our corporate GitLab. This provided a centralized location for managing the development process, tracking progress, and ensuring that all team members had access to the latest codebase.
2. Forking the Project at the v0.13.2 Tag
We made a fork of the original DataHub project at a specific tag named v0.13.2. This tag served as a stable starting point, allowing us to begin our customization efforts.
3. Integrating the Internationalization Service
Next, we incorporated an internationalization service into the project. This was a critical step in enabling the UI to support multiple languages, making the platform more accessible to non-English-speaking users.
As DataHub, the frontend is developed in ReactJS; we found libraries that could make the process of internationalization easier. The choice was to adopt i18next
e react-i18nnest
.

4. Leveraging a Base Project from GitHub
To guide our development, we referenced an existing project on GitHub: NaicheD/datahub. Although this project provided a solid structural framework, we chose not to use it from the start due to its significant divergence from the main branch of DataHub.
5. Using ChatGPT for Initial Translations
To expedite the localization process, we utilized ChatGPT to perform the initial translations of the existing project files to Brazillian Portuguese. This allowed us to quickly generate translations that could later be refined as needed.
6. Publishing the Internationalized Version for Internal Validation
Once the internationalization feature was implemented, we published this version for review by the internal team at CIASC. Their feedback was essential in ensuring the quality and usability of the new multilingual interface.
7. Creating the “feature/github-integration"
Branch
To manage the internationalization feature separately, we created a new branch called feature/github-integration
. This branch housed all changes related to i18n, making it easier to track and manage.
8. Updating the Branch with the Latest GitHub Version
We kept our feature/github-integration
branch in sync with the latest updates from the main branch on GitHub. This ensured that our work remained compatible with the most recent version of DataHub and minimized potential integration issues.
9. Forking the Project on GitHub with the Latest Version
Finally, we created a fork of the project on GitHub using the most current version of DataHub. This allowed us to either merge the internationalization feature back into the main project or maintain it as a separate branch, depending on the needs of the community and our ongoing development goals.
Feature Development
The development of the internationalization feature in the project was carried out in three main steps:
- Inclusion of the internationalization library for React: We integrated
i18next
andreact-i18next
into the project to manage translations and language switching. - Creation of JSON files for each language: We built JSON files containing all the necessary keywords for translation.
- Replacement of hard-coded labels and messages: We systematically replaced all hard-coded labels and messages in the code with their respective keywords, injected by the translation service.
The JSON configuration files followed the same standards proposed by the base project, with the following directory structure:

Each translation file (translation.json
), located in the respective language directory, contains labels and messages translated based on their context within the UI. For example, files related to the “Domain” scope begin with the “domain” key, as shown in the code snippet below:
"domain": {
"idDomain":"ID do domínio",
"confirmRemovedDomainTitle":"Confirmação de remoção de domínio",
"confirmRemovedDomain":"Tem certeza de que deseja remover este domínio?",
"setDomain":"Definir Domínio"
...
}
For more generic labels and messages, such as those related to deletion or user action confirmations, we used keyword options that can be dynamically replaced. The following code snippet illustrates this scenario:
"crud": {
...
"addWithName": "Adicionar {{name}}",
"createWithName": "Criar {{name}}",
"deleteWithName": "Excluir {{name}}",
...
"doYouWantTo": {
"deleteContentWithThisName": "Tem certeza de que deseja excluir este {{name}}?",
"removeTitleWithName": "Você quer remover {{name}}?"
},
...
Thanks to the use of the previously mentioned libraries, setting the language in which the application will render labels and messages can be easily done by adjusting the language keyword in the configuration file. In the example below, the project is configured to use the interface translated into Brazilian Portuguese (pt_br
).
i18n
.use(initReactI18next)
.init({
fallbackLng: 'pt_br',
interpolation: {
escapeValue: false,
},
resources
});
Finally, the most “tedious” task was replacing hard-coded terms throughout the codebase with the corresponding terms from the translation file. Although monotonous, this step was essential for delivering the feature. The code below exemplifies this process:

After several weeks of work and many commits, we were able to deliver the first stable version translated into Brazilian Portuguese to the internal CIASC team. An example of the project’s homepage can be seen in the following image.

And now, how do we merge into the main project?
To contribute this new feature to the main DataHub project, we devised a development strategy aimed at accomplishing this task as efficiently as possible. In order to accomplish this, we rolled back to the commit hash prior to the updates that included the internationalization feature. The command executed was as follows:
git reset --soft <COMMIT_HASH>
This approach allowed us to validate the number of files changed for this new feature, as well as the number of conflicts with the updated main branch of the DataHub project. In total, 335 files were modified, and 27 conflicts were encountered with the updated project. The image below illustrates this scenario.

The strategy for resolving conflicts was to retain all updates from the main DataHub fork while incorporating only the developed internationalization layer. This approach preserved the existing functionalities already included in the main branch and simply added the new feature.
Once this was done, a Pull Request was created from the fork to the main DataHub project. Fingers crossed!🤞

Acknowledgments
I would like to extend a special thanks to
for their invaluable support and contributions to this project. Your collaboration has made a significant impact, and I am deeply grateful.
Conclusion
Implementing internationalization in DataHub was a challenging but rewarding experience. By following a structured development strategy and leveraging both internal and external resources, we were able to create a more inclusive platform that can be used by a broader audience. We’re proud to contribute this feature back to the open-source community and look forward to seeing how it will benefit others.
Join our DataHub Slack Community!
Happy Coding!
References
- CIASC Homepage: https://www.ciasc.sc.gov.br/.
- BoaVista Homepage: https://www.ciasc.sc.gov.br/boavista/.
- Datahub: https://github.com/datahub-project/datahub.
- Datahub i18n: https://github.com/luizhsalazar/datahub/tree/feature/i18n-support.
- JS Library react-i18next: https://react.i18next.com/.
- Pull Request URL: https://github.com/datahub-project/datahub/pull/11207.