New study: Researching chatbots ahead of 2024 state elections

Large Language Models Continue To Be Unreliable Concerning Elections

By our research on Large Language Models and election information, we were able to substantially improve the reliability of safeguards in the Microsoft Copilot chatbot against election misinformation in German. However, barriers to data access greatly restricted our investigations into other chatbots.

Clara Helming
Senior Advocacy & Policy Manager
Oliver Marsh
Head of Tech Research

About the report in short

What we found

We were able to investigate Microsoft Copilot in more detail than the other models. This is because (i) we were able to automate data collection directly from the chatbot, and (ii) we were provided with some basic usage data via a data access request to Microsoft. 

Analysis of Google Gemini, and GPT-3.5 and 4o by OpenAI, was more challenging. Due to technical limitations placed by the companies, we could not automate collection from the chatbots, but only access data via Application Programming Interfaces (APIs). APIs are a more technical way than chatbots to ask questions of models, and also differ from chatbots in features such as parameters or metaprompts which can affect outputs. As such, data collected via APImay only give limited insight into how “normal” users experience the chatbots; chatbots. Nonetheless, based on the API data, we found:

Our recommendations

Read more on our policy & advocacy work on ADM in the public sphere.