If you were to read a story about male and female historians translated by Google, you might be forgiven for overlooking the females in the group. The phrase “vier Historikerinnen und Historiker” (four male and female historians) is rendered as “cuatro historiadores” (four male historians) in Spanish, with similar results in Italian, French and Polish. Female historians are simply removed from the text.
In an experiment, I translated 11 occupations from one gender-inflected language to another. I analyzed 440 translation pairs to and from German, Italian, Polish, Spanish and French. Together, these languages are natively spoken by three in four citizens of the European Union.
Fitting the stereotypes
In many cases, Google changed the gender of the word in a grossly stereotypical way. “Die Präsidentin” (the female president) is rendered to “il presidente” in Italian, although the correct translation is “la presidente”. “Der Krankenpfleger” (the male nurse in German) becomes “l’infirmière” (the female nurse) in French.
In my list, shop assistant was best translated by Google, with 33 correct translations out of 40. From French to Spanish for instance, “la vendeuse” was correctly translated to “la vendedora” and “le vendeur” to “el vendedor”.
Errors are not systematic, showing that they can be fixed. “Kierowniczka” (Polish for female director) was correctly translated in all four target languages, although “die Chefin”, “la capa”, “la jefa” and “la cheffe” were wrongly translated to their masculine forms. (When Google correctly translated a feminine occupation, it was often because the target language’s word was not gender-inflected. For instance, “l'insegnante” in Italian designates both a female and a male teacher.)
The experiment’s code and data are available online.
This experiment might not reflect what Google Translate shows when translating web pages or longer texts. In some cases, especially when nearby words contain feminine forms, Google correctly translates gender-inflected forms.
Stereotypes sneak into translations because Google optimizes translations for English.
A Google spokesperson told AlgorithmWatch that “translating between language pairs requires high volumes of bilingual data that often don’t exist for all language pairs. The way to enable these translations is by using a technique called ‘bridging’. Language bridging in translation means that to translate from X to Y a third language is introduced (E) based on the existence of bilingual data to translate X to E and then E to Y. The most common language used as bridge is English.”
“The majority of nouns in English are gender-neutral: so, when translating the feminine term for ‘nurse’ from a gender-inflected language to English, the gender is ‘lost’ in the translation to the bridging language,” the Google spokesperson added.
Several experts I talked to agreed that the community of researchers working on machine translation was not very concerned about non-English languages. Only in May 2020 did the Association for Computational Linguistics, a large professional body, tell reviewers of their annual conference that they could not reject a paper solely because it was about a language other than English.
In 2018, Google introduced a feature that alerted users that some words could be gender-specific when translating from English.
However, it is unclear whether such efforts were made in earnest. Over two years after the changes were deployed, “developer” is correctly translated into French both in the masculine form as “le développeur” and in the feminine as “la développeuse”. But “the developer” translates to “le développeur” and all the sentences I tried translated into the masculine, including the phrase “the developer is a woman”.
In my experiment, 182 translations out of 440 turned out to be false. In their vast majority, the errors had to do with feminine forms converted to their masculine equivalent. 68 of the false translations were marked as “verified” by Google.
The Google spokesperson declined to explain precisely how the “verified” label was awarded. “We mark translations as ‘verified’ when they’ve been reviewed by several volunteers in the Google Translate Community and these volunteers agree the translation is correct”, they said. “We are improving our detection of low-quality contributions with automated scoring methods and periodic knowledge checks.”
My experiment raised other issues. “Le chef” (the boss, in French), was translated to “der Führer” in German, a word meaning “the guide” and very strongly linked to the Nazi era. The translation was marked as verified.
But Google reassured me that no extremist group infiltrated the “Google Translate Community” to spread far-right language. “In this specific case, [the error] is due to the ‘bridging’ process”, the spokesperson said. “If you do a translation for ‘le chef’ from French to English we get ‘leader’. If you then translate ‘leader' from English to German you get 'Führer’”.
Google Translate is not just another translation service. It is a feature that Europeans can hardly escape.
Since an update in April 2019, Google Chrome prompts users to instantly translate web pages. Anyone visiting a website in a foreign language is asked to choose between the original or the google-translated version, even if the website offers an official translation in the user’s preferred language. (Google cannot detect websites that provide an official translation and “errs on the side of helpfulness by offering a translate option in all circumstances”, the spokesperson said. They also said users could turn off the translation prompt.)
Approximately 250 million, or one in two, citizens of the European Union use an Android phone. Unless they manage to bypass the system’s blocks (by “rooting” their device), they cannot remove Google Chrome. It is likely that many of them use Google Translate, perhaps unwittingly.