The BAMF’s controversial dialect recognition software: new languages and an EU pilot project
"Pretty much hopeless", says computer linguist from the center where BAMF buys its training data.
For several years, Germany has been the only country to use automated language and dialect recognition in asylum procedures, purportedly to help authorities verify a person’s claims about where they are from. According to the privacy advocacy group European Digital Rights, this is a technology in line with AI claiming to predict someone’s sexual orientation or political beliefs and should be banned. (The software also makes mistakes and, despite promises that it is only used for “clues”, is known to have had a strong influence on some people’s asylum cases.)
Now, an inquiry by MP Clara Bünger reveals that, since July 2022, the software is being used to recognize Farsi, Dari and Pashto (in addition to Iraqi Arabic, Maghrebi Arabic, Levantine Arabic, Gulf Arabic and Egyptian Arabic.) Not everyone is convinced by this new development. The American computational linguist Mark Liberman, while emphasizing he is no expert on Persian languages, told us he is skeptical about training a machine to classify spoken Farsi, Dari and Pashto. (Liberman is the director of the Linguistic Data Consortium at the University of Pennsylvania, where the German Federal Asylum Agency, or BAMF, gets the majority of the training data for its software).
Critics have derided the idea of using voice recognition to detect a person's accent to determine their nationality as essentialist pseudoscience. But while BAMF calls their tool “dialect recognition software” (DIAS), they also sometimes describe it as “language biometry.” Which is it? In fact, Liberman writes to AlgorithmWatch, “the difficulty for BAMF's desired solution is that there's not really a well-defined difference between "language" and "dialect."
In Europe, the “national language”, or “standardized language”, was ushered in with the development of nation states. Before that, it would start getting harder to understand people if you left your home by 100 km. But even today, you can have two Spanish-speakers, for example, who will not understand each other's “local tongue”. And a person who lives in the Italian Abruzzo region and speaks school-taught “standard Italian” will sound completely different from someone from the same region who speaks Abruzzese Occidentale.
“My understanding is that the "standardization" of Dari and Pashto is even less strongly established, and so the idea of establishing bright lines separating vernacular spoken (Iranian) Persian, Dari, and Pashto is probably pretty much hopeless,” writes Liberman.
Still, several EU countries are keen on Germany’s “innovative technical solutions”, which is what the BAMF promised other EU migration agencies at a conference in 2020. BAMF’s proposal then: “a pilot project with several European countries'' to integrate DIAS into one common language testing procedure for asylum seekers. Now, the interior ministry confirms that the countries taking part are Austria, Finland, Norway, Sweden, Greece, Switzerland, and Lithuania. BAMF representatives have already traveled to Norway and Switzerland to demonstrate how DIAS works.
MP Clara Bünger told AlgorithmWatch and netzpolitik.org that “I seriously doubt that BAMF’s dialect recognition software is an appropriate means to get valid indications about the identity and origins of asylum seekers.” She also criticized a “culture of distrust at BAMF”, where “asylum seekers are suspected of systematically making wrong statements about their identity and origins” and demanded more “training for BAMF employees” rather than “fallible and expensive technical solutions.”
Did you like this story?
Every two weeks, our newsletter Automated Society delves into the unreported ways automated systems affect society and the world around you. Subscribe now to receive the next issue in your inbox!