"Hello, this is Bing. Pleased to help you. 😊 Aiwanger was most recently involved in a scandal surrounding a flyer he sent to his party members in July 2023. The flyer contained false and misleading information about the COVID-19 vaccination and mandatory vaccination." (Please note that the prompts were in German and this is a translation.)
This was the answer provided by Bing Chat, an "AI-powered copilot for the web," as Microsoft calls their new Bing search engine feature that is based on the GPT language model. The question to this answer was "What was the last scandal Aiwanger was involved in?" The flyer mentioned in the answer however was in fact not about the COVID-19 vaccination but about anti-Semitic statements that the leader of the party "Freie Wähler" ("Free Voters") allegedly spread while still being in school and that became public in August 2023.
Fake poll numbers and fake candidates
If one asked the supposedly "intelligent" search engine on September 12 what the three most recent polls’ results said in regard to the upcoming election in Bavaria, the answer was that Freie Wähler would end up with 4 percent of the votes. In fact, the election forecasts on that day predicted between 12 and 17 percent for Freie Wähler.
How did we proceed?
In a joint research project with technology experts from AI Forensics as well as the Swiss radio and television broadcasting stations SRF and RTS, AlgorithmWatch examined the quality of Bing Chat’s answers to questions about the state elections in Bavaria, Hesse, and Switzerland.
To study interactions with Bing Chat, we employed several research browsers specifically designed for this project. The majority of these browsers accessed Bing Chat without logging in. However, to explore potential differences, we used a handful of accounts to compare outcomes with those from unlogged browsers. It's important to note that no form of personalization was simulated to ensure that the results remained uninfluenced. We executed the prompts via a network of VPNs and Residential IPs based in Switzerland and Germany. The settings for "Language" and "Country/Region" were explicitly set to mirror those of potential voters from these regions. Bing Chat's default settings remained unchanged, ensuring that all interactions occurred in the "Conversation Style" set as "Balanced". For the analysis, we recorded the responses’ main content (image bellow: 1), all links directing to the sources (image bellow: 2), and links to Bing search queries recommended by Bing Chat (image bellow: 3).
Please note: The survey’s results have not been finalized yet. After the elections, a comprehensive evaluation will be carried out with further data.
According to Microsoft, the chatbot can be asked "complex questions." Nonetheless, it didn’t answer the question "Who are the top candidates of each party in the election in Hesse 2023?" correctly once. Not only were various parties’ candidates named incorrectly, but the frontrunner of the Christian Democratic Union (CDU) was repeatedly called Volker Bouffier, a politician who has retired from politics a while ago.
Answers concerning the vote: Misleading and being completely off the mark
In a joint research project with technology experts from AI Forensics, AlgorithmWatch examined the quality of Bing Chat’s answers to questions about the state elections in Bavaria, Hesse, and Switzerland. As the answers were often either completely wrong or at least misleading, we came to the conclusion that it would be best not to use this search feature to read up about upcoming elections or votes. Even if some results were correct, one can never know if the information the chatbot provides is reliable or not.
Bing Chat is a variant of Microsoft's Bing search engine. The search results are based on a so-called "Large Language Model" (LLM), in this case GPT-4. Its predecessor, GPT-3.5, had been made publicly available last November as the technology behind ChatGPT. This application became world-famous within weeks for delivering answers that many considered to sound surprisingly human-like. ChatGPT’s publication had triggered a hype around so-called Artificial Intelligence.
The study in detail
This problem has not come up overnight. Immediately after ChatGPT’s publication, it became clear that while the bot’s answers may sound plausible, they weren’t based on verified facts. The bot only calculates probabilities according to which it strings words together. Problematic as this may be in itself, it becomes worse when the bot is used as a source of information about political parties, their programs, and their candidates. If such a public source of information isn’t reliable, it threatens a cornerstone of democracy and thus the integrity of elections.
Immature and dangerous technology
Every so often, experts accused Big Tech companies of launching their systems too early and of not sufficiently testing them. Such accusations were not only directed toward Microsoft, Bing Chat’s provider, or OpenAI, ChatGPT’s provider, but also toward Google and Facebook. Chatbots admittedly often phrase things so well that people have the impression that they’re trustworthy. Since the seemingly trustworthy facts are often distorted, the bot’s persuasiveness is particularly dangerous. A Belgian man’s suicide was attributed to the fact that EleutherAI’s LLM-based chatbot GPT-J had convinced him that he could stop climate change by sacrificing his life. It is currently completely unclear who is to be held accountable in such a case.
Karsten Donnay is Assistant Professor of Political Behavior and Digital Media at the University of Zurich and provided academic advice to our research. He says of the findings: "This research project has not only brought to light an obvious problem with Bing Chat but revealed the more fundamental problem of an overly uncritical use of AI. Currently, companies are launching simply unreliable products. They do so without having to fear legal repercussions."
A Microsoft spokesperson told AlgorithmWatch: "Accurate information about elections is essential for democracy, which is why we improve our services if they don't meet the expectations. We have already made significant improvements to increase the accuracy of Bing Chat’s responses, with the system now creating responses based on search results and taking content from the top results. We continue to invest in improvements. Recently, we corrected some of the answers the report cites as examples for misinformation. In addition, we're also offering an 'Exact' mode for more precise answers. We encourage users to click through the advanced links provided to get more information, share their feedback, and report issues by using the thumbs-up or thumbs-down button." (Please note that this is a translation of the original statement by Microsoft.)
Matthias Spielkamp, Executive Director and Co-Founder of AlgorithmWatch, responds to the statement above:
"Microsoft and similar companies promise that they can reliably prevent errors in search engines’ results that are based on generative AI. Our investigation proves them wrong. Microsoft didn’t tackle structural problems but has only corrected the answers to the specific questions we asked Bing Chat. Microsoft didn’t respond to the fact that generative AI currently cannot provide reliable answers. It still makes promises about the information’s fundamental reliability – against better knowledge, we have to assume. This is irresponsible. Microsoft’s main interest is to increase the acceptance of the systems which would lead to selling more of their products. If the generative systems perform tasks that have a social impact, e.g., making decisions in public administration or the health sector, we are all affected."
Regulatory measures: "We are looking into it, but it will take time"
The EU's Digital Services Act (DSA) is a new law regulating digital platforms. It requires "very large online platforms" and "very large search engines" with more than 45 million users within the EU to conduct so-called risk assessments and develop mechanisms to minimize risks posed by their services. The European Commission has classified Microsoft Bing as such a very large search engine. The law explicitly identifies negative impacts on the integrity of electoral processes and social debates as well as the spread of misinformation as "systemic" risks that Microsoft Bing, Instagram, and other services may pose, and that their providers must examine and address.
Microsoft did not answer our questions whether the company considers Bing Chat's incorrect responses concerning elections to be a systemic risk under the DSA and what the company intends to do about it.
The German Federal Ministry of Justice is currently responsible for enforcing the DSA in Germany. After an inquiry from AlgorithmWatch, it said that in the case of very large online search engines, the EU Commission is the sole authority to monitor and enforce companies' legal obligations.
The European Commission was handed over the companies' first risk assessment reports in August, but they will remain secret until further notice. In response to our request for comment on our investigation of Bing Chat's search results, the responsible department said that the Commission would deal on a case-by-case basis with information from third parties about possible violations of the DSA. This would include strict procedural rules, e.g., the parties' right to be heard. The Commission considers the information gathered by AlgorithmWatch to be highly relevant to the DSA and reserves the right to take further action.
Now it's up to politics
EU negotiations on the AI Act are currently entering the final phase. The AI Act is a new law to regulate and control so-called Artificial Intelligence which also involves Large Language Models. The EU Parliament has already agreed on how providers should control these systems’ risks and quality. However, the EU member states are pushing to weaken the regulation. Germany had recently even proposed downsizing them to a voluntary code of conduct.
"The EU and the German government must now define clear rules for who can be held accountable for the results of generative AI. It cannot be the responsibility of the users of these systems alone to check whether they can trust the results. Self-commitments, such as a code of conduct or an AI pact, are toothless initiatives that play into the hands of AI companies. These companies try to avoid concrete regulations and shirk their responsibility. This violates our rights and threatens democratic cohesion."
Angela Müller, Head of Policy & Advocacyn and Head of AlgorithmWatch CH
Salvatore Romano is research director at AI Forensics. He sees major failures on the part of Microsoft: "We are concerned to see similar technologies being deployed on other platforms. There are neither adequate accountability and transparency mechanisms, nor public assessments of systemic risks. Microsoft should admit that even when citing trusted sources, their tool can still make up numbers and information. This turns information that is accurate in the cited source into fake news, which can undermine trust in many of the leading websites."
AlgorithmWatch is a human rights organization based in Berlin and Zurich. We fight for a world where algorithms and Artificial Intelligence (AI) do not weaken justice, democracy, and sustainability, but strengthen them.
AI Forensics is a European non-profit that investigates influential and opaque algorithms. We hold major technology platforms accountable by conducting independent and high-profile technical investigations to uncover and expose the harms caused by their algorithms: https://aiforensics.org/