Automatic data reports in multiple languages with Python, Gettext and Localazy

Word documents may seem a bit old school, but there are still cases when using them is the shortest route to this goal. Whenever automation is discussed, Python is a tool of choice.

But everything’s all fun and games until other languages show up.

In theory, preparing another language version does not seem complicated. After all, it is enough to translate individual labels and descriptions into another language and then hardcode them in the next version of the script. In practice, this simple approach is time-consuming and error-prone, especially when there is a constant need to modify some messages. Not to mention the difficulty of working with non-technical translators using this method.

This article shows how to save time by translating the Python-generated data report by using Gettext and Localazy. I’ll show you how to extract messages from the source code (Gettext), store them in separate files and how to make them available in a collaborative translation tool (Localazy). Also, you will learn how to deal with difficulties in translating f-strings.

🤔 Why? 🔗

Imagine that you work in the analytics team at an international company that has 3 branches in different countries (England, Germany, Poland).

Illustrative Map

You have created an interesting descriptive report, the results of which can significantly improve the performance of each branch.

Data Report Example The report is generated in docs format for easy email distribuiton

You want it to reach the widest possible audience, but your company does not work exclusively in English. As your report is generated automatically once a day, having it translated manually into national languages every time is not an option.

But if you can automate report generation, why not do the same with translation?

▶️ Project start 🔗

In the beginning, your project is one file with the main script and dataset in CSV format.

.py and .csv files

The script is quite self-explanatory. Using the data loaded from the CSV file, it calculates various statistics, generates a bar chart and packs everything into a Word document. The usage of the script is simple. All you need to do is specify the city and the date.

Command Line Picture

Note: This project is aimed at teaching localization of Python applications, so I’m skipping aspects like validating input variables or being interesting 😆

Perhaps, we could translate messages into other languages and use if statements to change them according to the user’s will, but this approach would be tedious and error-prone. Therefore we will first extract all strings from the code to work on them separately from the business logic.

📝 Gettext 🔗

GNU gettext is a universal set of tools for producing multi-lingual messages. It provides a framework to support translated message strings. It supports many programming languages , including Python. The gettext module comes shipped with Python standard library. The best thing about gettext is that it will help us seamlessly extract text messages into separate files.

As our report is prepared for data from London, Warsaw and Berlin, we will prepare English, Polish and German language versions. First, we need to prepare the directory structure.

mkdir -p locales/{de,pl}/LC_MESSAGES

Then, we should extract the messages from the code.

xgettext -d base -o locales/base.pot

Or, alternatively:

/Library/Frameworks/Python.framework/Versions/3.8/share/doc/python3.8/examples/Tools/i18n/pygettext.py -d base -o locales/base.pot report.py

To find the pygettext.py file, you can use the command: locate pygettext.py .

Note: using plain gettext command will force you to modify the result file by adding CHARSET header “Content-Type: text/plain; charset=UTF-8n”

That will generate in the locales folder a base.pot file with strings taken from the report.py file.

Unfortunately, as you can see, generated base.pot does not contain any strings. To fix this, we need to modify report.py by marking the messages for translation.

After generating the base.pot again, two strings appear in it.

After this paragraph, the project should look like this:

https://github.com/fischerbach/python_localization_tutorial/tree/002-gettext

Use this repository: https://github.com/fischerbach/python_localization_tutorial The branches contain the following steps discussed in this article.

Project branches overview

🎉 First translations 🔗

Now let’s prepare the first translations. Copy and rename the base.pot into each language folder:

cp locales/base.pot locales/de/LC_MESSAGES/base.po
cp locales/base.pot locales/pl/LC_MESSAGES/base.po

Let’s modify the individual language files:

To use translation in our program, we need to generate the MO files. MO files are binary data files that are parsed by the Python gettext module and used in the program.

msgfmt -o locales/de/LC_MESSAGES/base.mo locales/de/LC_MESSAGES/base
msgfmt -o locales/pl/LC_MESSAGES/base.mo locales/pl/LC_MESSAGES/base

Now we can modify the script to generate reports in different languages.

From now on, we will also pass the appropriate translation function to the generate_report function.

Let’s test:

python3 report.py sales.csv Warsaw 2019-01-04 pl

Data Report Example 2

The sentence in the middle paragraph has been replaced! If we change the last parameter to de, we get a version in German.

Data Report Example 3

One sentence in the report translated, it’s time for the rest. The procedure is the same. Every time a string with a message appears in the source code, surround it with the function _(‘This is a string’). Once this is done, generate the POT file from the beginning, copy it to the locales of each language, translate it and generate the binary files.

🥺 But hey, it was supposed to be easier 🔗

So in the next iteration of our solution, we create two helper scripts (generate_po.sh and generate_mo.sh). You have all the changes here:

https://github.com/fischerbach/python_localization_tutorial/tree/004-gettext-generators

Let’s check one of the language files:

https://github.com/fischerbach/python_localization_tutorial/blob/004-gettext-generators/locales/pl/LC_MESSAGES/base.po

As you can see, even a relatively simple report can result in a fairly significant number of labels to translate. Additionally, with each change, we would have to take care of merging the changes by using msgmerge. But we will use something better.

🚀 Integrating Localazy 🔗

Localazy is an awesome piece of software that makes the usually awful translation experience bearable and even almost enjoyable. It supports many frameworks and file formats and provides CLI tools for build automation. My favourite features are the possibility of cooperative translation and automagic management of changes in translated files.

Localazy phrases example

So let’s integrate our report with Localazy. First, create a Localazy account and install Localazy CLI. Then, create a new application.

Localazy New App Screen

Make sure to set the App Type to Private app. If your app does not contain sensitive data, you can safely leave it Public. However, we will set it to Private, for reasons described in the last section “f-strings problem” of this article. Then, select POT files from available file formats.

Localazy File Formats

You will see a template configuration file localazy.json. Copy it to the project main folder.

Localazy Upload strings screen

Remember to modify the locales folder path. Go to your app on Localazy and add some new languages.

Localazy add languages

Now you can generate the PO files again and load them into Localazy:

bash generate_po.sh
localazy upload

Localazy CLI

After a while, you will see a list of phrases to translate in each language of your application.

Localazy Polish language phrases

And the cherry on the top, a machine translation comes with each phrase.

Localazy Translate Screen

Once all the translations have been accepted or created, you can download them into your application and re-generate binary MO files:

localazy download
bash generate_mo.sh

Localazy CLI 2

Let’s check the report in Polish:

Report in Polish

As someone with some understanding of Polish, I’d say it’s quite acceptable. Finally, let’s address one more issue.

⚠️ f-strings problem 🔗

The project uses f-strings quite extensively. Unfortunately, we cannot use them as arguments of _() function, gettext will return an error. The problem can be solved by changing f-strings to a .format() statements or string concatenations. But I like f-strings, and generating text reports is indeed a model case for using them. Fortunately, there is a workaround.

https://gist.github.com/fischerbach/993e6fab4caf67af6c63281fe3cb8b67

We just wrap the f-string in a function that evaluates it. However, there are potential risks associated with using the eval function, because it runs the code contained in the string. This is why we made the application in Localazy private, so as not to run unfiltered code from users.

Localazy Placeholders

Another limitation is that every time you need to change the expression inside the string, you’ll need to update your .po file as well. However, thanks to Localazy, it is effortless to do so.

🤓 Takeaways 🔗

As you can see, the duo of Gettext and Localazy is a flexible solution to localization problems. Each addresses different sources of workload and they complement each other wonderfully.

The f-string issue remains to be solved, especially in the context of community translations. It is also worth considering what to do with the labels that appear in the dataset.

Thank you for reading. I hope you enjoyed reading as much as I enjoyed writing this for you.

If you would like to share feedback or simply say ‘hello’, you can connect with me: https://www.linkedin.com/in/rafalrybnik/

If you enjoyed reading this, you’ll probably enjoy my other articles too: https://fischerbach.medium.com