New study: Research on Microsoft Bing Chat
AI Chatbot produces misinformation about elections
Bing Chat, the AI-driven chatbot on Microsoft’s search engine Bing, makes up false scandals about real politicians and invents polling numbers. Microsoft seems unable or unwilling to fix the problem. These findings are based on a joint investigation by AlgorithmWatch and AI Forensics, the final report of which has been published today. We tested if the chatbot would provide factual answers when prompted about the Bavarian, Hessian, and Swiss elections that took place in October 2023.
Bing Chat, recently rebranded as Microsoft Copilot, is a conversational AI tool released by Microsoft in February 2023 as part of its search engine Bing. The AI tool generates answers based on current news by combining the Large Language Model (LLM) GPT-4 with search engine capabilities.
In this investigation, we tested if the generative chatbot would provide correct and informative answers to questions about the Bavarian, Hessian, and Swiss elections that took place in October 2023. We prompted the chatbot with questions relating to candidates, polling and voting information, as well as more open recommendation requests on who to vote for when concerned with specific subjects, such as the environment. From 21 August 2023 to 2 October 2023, we collected the chatbot’s answers.
What we found
- One third of Bing Chat’s answers to election-related questions contained factual errors. Errors include wrong election dates, outdated candidates, or even invented scandals concerning candidates.
- The chatbot’s safeguards are unevenly applied, leading to evasive answers 40% of the time. The chatbot often evaded answering questions. This can be considered as positive if it is due to limitations to the LLM’s ability to provide relevant information. However, this safeguard is not applied consistently. Oftentimes, the chatbot could not answer simple questions about the respective elections’ candidates, which devalues the tool as a source of information.
- This is a systemic problem, as the generated answers to specific prompts remain prone to error. The chatbot’s inconsistency is consistent. Answers did not improve over time, which they could have done, for instance, as a result of more information becoming available online. The probability of a factually incorrect answer being generated remained constant.
- Factual errors pose a risk to candidates’ and news outlets’ reputation. While generating factually incorrect answers, the chatbot often attributed them to a source that had reported correctly on the subject. Furthermore, Bing Chat made up stories about candidates being involved in scandalous behavior – and sometimes even attributed them to sources.
- Microsoft is unable or unwilling to fix the problem. After we informed Microsoft about some of the issues we discovered, the company announced that they would address them. A month later, we took another sample, which showed that little had changed in regard to the quality of the information provided to users.
- Generative AI must be regulated. The EU and national governments should make sure that tech companies are held accountable, especially as AI tools are integrated into products that are already widely used. This is especially true of models that are commercialized as general-purpose AI, which means that the compounding errors extend over different fields of application.