In this investigation, we tested if the generative chatbot would provide correct and informative answers to questions about the Bavarian, Hessian, and Swiss elections that took place in October 2023. We prompted the chatbot with questions relating to candidates, polling and voting information, as well as more open recommendation requests on who to vote for when concerned with specific subjects, such as the environment. From 21 August 2023 to 2 October 2023, we collected the chatbot’s answers.
What we found
- One third of Bing Chat’s answers to election-related questions contained factual errors. Errors include wrong election dates, outdated candidates, or even invented scandals concerning candidates.
- The chatbot’s safeguards are unevenly applied, leading to evasive answers 40% of the time. The chatbot often evaded answering questions. This can be considered as positive if it is due to limitations to the LLM’s ability to provide relevant information. However, this safeguard is not applied consistently. Oftentimes, the chatbot could not answer simple questions about the respective elections’ candidates, which devalues the tool as a source of information.
- This is a systemic problem, as the generated answers to specific prompts remain prone to error. The chatbot’s inconsistency is consistent. Answers did not improve over time, which they could have done, for instance, as a result of more information becoming available online. The probability of a factually incorrect answer being generated remained constant.
- Factual errors pose a risk to candidates’ and news outlets’ reputation. While generating factually incorrect answers, the chatbot often attributed them to a source that had reported correctly on the subject. Furthermore, Bing Chat made up stories about candidates being involved in scandalous behavior – and sometimes even attributed them to sources.
- Microsoft is unable or unwilling to fix the problem. After we informed Microsoft about some of the issues we discovered, the company announced that they would address them. A month later, we took another sample, which showed that little had changed in regard to the quality of the information provided to users.
- Generative AI must be regulated. The EU and national governments should make sure that tech companies are held accountable, especially as AI tools are integrated into products that are already widely used. This is especially true of models that are commercialized as general-purpose AI, which means that the compounding errors extend over different fields of application.