Microsoft Copilot is a chatbot that Microsoft developed based on the large language model (LLM) GPT-4, which combines generative AI and Bing search engine features. It formulates responses by collecting search results and summarizing the findings with sources for users.
In a collaborative investigation, the nonprofits AlgorithmWatch and Al Forensics tested at scale the quality of information that Bing Chat provides about elections. Over a three-month period, from August to October 2023, researchers systematically prompted Bing Chat about the October 2023 federal elections in Switzerland and state elections in the German states Hesse and Bavaria. The queries covered topics that voters might realistically search for before an election, such as how to vote, candidates running for election, and the latest election polls.
Voters should not rely on Bing’s chatbot as a source of information
Researchers found that nearly 30% of the chatbot’s answers contained factual errors, including but not limited to: false polling numbers, outdated candidates, and wrong election dates. Even more concerning, the chatbot fabricated controversies about candidates. This issue was found to be systemic and persisted across time, countries, and languages (with prompts conducted in English, German, and French).
“It’s time we discredit referring to these mistakes as ‘hallucinations’. Our research exposes the much more intricate and structural occurrence of misleading factual errors in general-purpose LLMs and chatbots.”Riccardo Angius, Applied Math Lead and Researcher AI Forensics
Generative AI undermines trust in institutions
Factually incorrect and fabricated stories pose a risk to the cited news outlets' and candidates' reputation. For example: The chatbot attributed incorrect polling numbers to trusted news sources, even when the news source reported them correctly.
Made-up narratives about candidates are linked to reliable sources, creating the illusion of validity. This could further sow distrust in news media. The chatbot’s wrong information also might influence the voters' opinions about candidates.
Microsoft‘s Bing Chat is an unreliable source of information during elections. More than that, it can pollute the information ecosystem by misquoting reliable sources and fabricating stories.
Microsoft seems unable or unwilling to fix the proble
On the company’s blog, Microsoft has announced measures to protect the information integrity during elections, but these fall short. The company promises to provide voters with “authoritative election information” through Bing. While the chatbot might cite reliable sources, it misquotes those sources in its answers. Second, Microsoft has promised to help candidates and campaigns maintain better control over narratives around them. However, the chatbot itself is a source of false narratives.
“Our research shows that malicious actors are not the only source of misinformation; general-purpose chatbots can be just as threatening to the information ecosystem. Microsoft should acknowledge this, and recognize that flagging the generative AI content made by others is not enough. Their tools, even when implicating trustworthy sources, produce incorrect information at scale.”Salvatore Romano, Senior Researcher AI Forensics
Moreover, before the initial report in October 2023, Microsoft Deutschland received a set of prompts that returned incorrect answers. Microsoft acknowledged the need for accurate election information and stated that they had improved Bing Chat to base responses on top search results. But in a follow-up assessment, AI Forensics and Algorithm Watch found little progress. Despite some corrections, Bing Chat continued to fabricate stories about candidate controversies and provided incorrect information about Swiss candidates and their cantons.
More about the study
Regulation is needed to reign in big tech
Our findings indicate a lack of adequate safeguards in Microsoft‘s chatbot and the underlying model GPT-4. By introducing generative AI to the public without necessary safety measures in place, tech companies risk undermining people’s access to reliable information. Search engines are especially vulnerable in this regard, as they hold great power in ranking information and are one of the main access points to information on the internet.
„Until now, tech companies have introduced societal risks without having to fear serious consequences. Individual users are left to their own devices in seperating fact from AI-fabricated fiction.“Clara Helming, Senior Policy and Advocacy Manager AlgorithmWatch
Governments should take societal risks stemming from big AI applications seriously – by introducing and enforcing rules that hold the power of Big Tech in check.
The EU’s Digital Services Act, a law introduced in 2022 to regulate digital platforms, requires operators of search engines with more than 45 million users within the EU to carry out so-called risk assessments and develop mechanisms to mitigate the risks posed by their services. The European Commission has categorized Microsoft Bing as such very large search engines. The law explicitly mentions negative effects on the integrity of elections and the spread of misinformation as “systemic risks.” According to a statement in response to the report findings, the EU Commission considers the information to be highly relevant to the DSA and reserves the right to take further action.
The EU is currently also in the final stages of passing the AI Act, an abitious rulebook that aims to regulate AI across all sectors. It foresees obligations for general purpose AI systems and the underlying models, with more comprehensive obligations for high-impact models that come with systemic risks to society.
„It remains to be seen how these provisions can effectively tackle such AI models' negative impact on public debate. The AI Act still has to prove its legal force by preventing Big Tech companies from working their way around the rules.“Angela Müller, Head of Policy & Advocacy AlgorithmWatch