How do I internationalize my Django app?
This is a how-to post I was recently asked to write for Excella Consulting’s blog.
Internationalizing a website is hard. If you have a simple site and are working with a mature web framework, you might not have much trouble. But if your web app is dynamic, complex, and tightly interfaces with other technologies, then you need to plan your technical approach more carefully. This spring, I helped one of our Washington, DC clients launch their first multilingual site targeted at approximately 14 million users using Django. No small feat, but in the end we succeeded in creating a unique experience for our new audience without adding needless complexity to the app.
Here are a few key lessons I picked up along the way:
Django packages from the community work like magic … but at a price
Before we began development we identified the django-modeltranslation package as a great tool for our project. Modeltranslation finds model fields that you designate in a special file and automagically adds fields for other languages to your database and your admin views. This approach has great appeal because of its scalability. All it takes to add a new language to your app is a quick addition to your settings.py file.
The problems show up when you need to reference those translated fields in other Python files. In our case, it was adding them to our Solr index using django-haystack. Because the translated fields are never actually declared in the models, we couldn’t get haystack to see them or index them. That was a dealbreaker, so we left modeltranslation behind and decided to manually add the new fields to our models.
Keeping your content management strategy in mind is key
Throughout development, we assumed that we wanted to keep Spanish and English content in the same model. It made sense because we knew 95% of our content would exist in both languages and splitting them apart would have required us to duplicate a few fields across models.
Our problem was that we didn’t take our content management processes into account. After launch it’s true that 95% of our content existed in both languages, but the content approval process was separate for each. We hit issues because our combined model used a single field to determine if a question should be added to our search indexes, but often the English content would be ready for indexing and the Spanish content wouldn’t. This forced our content authors to choose between rushing one language’s content to production or removing the other from our site until both were ready.
Ultimately, we decided to split the two languages into separate models with an abstract base class, which would have been an easy fix at the beginning.
Django plays great with unicode, but the rest of your stack might not
Django has lots of great features (and even better documentation) around how it treats unicode, which are crucial when working with non-ASCII characters in other languages. The few times we did run into unicode issues within the app itself, they were pretty easy to track down.
Things get trickier as you move to higher environments. In our case, non-ASCII characters were garbled when passing through Tomcat on the way to our Solr server. A hard bug to find, but an easy fix here.
Once we got to our staging server, we learned that our firewall was configured to block high-bit characters, prohibiting us from using any accented characters in our search terms. Fixing this makes a non-trivial change to the firewall, which meant a lot of discussion and review. Luckily for us, the security and infrastructure teams did a really great job handling it.
Thinking ahead pays
The moral of the story here is simple: internationalizing a complex web application is hard, and though certain tools within the community make it look easy it requires more forethought and planning than you might expect. Our project ended up a big success, but if you are considering internationalizing your Django app soon, I hope you can avoid some of the bumps we hit along the way.