Berlin researchers are investigating the extent to which ChatGPT provides scientifically based information about climate change. They find that AI often provides correct answers, but that under no circumstances should it be blindly trusted. Checking sources is more important than ever—but not easy.
ChatGPT and other large language models based on machine learning and large data sets are penetrating almost every area of society. Companies or researchers that do not enlist their help are increasingly seen as anachronistic. But is the information from artificial intelligence reliable enough? Scientists at the Technical University of Berlin tested this using climate change. To do this, they asked ChatGPT questions about the topic and examined the answers for accuracy, relevance and possible errors and contradictions.
Its impressive capabilities have made ChatGPT a potential resource on many different topics, writes the Berlin team in the paper published in “Ecological Economics”. However, not even the developers themselves were able to explain how a certain response arises. This may still be suitable for creative tasks such as writing a poem. However, this is a problem when it comes to topics such as the consequences of climate change, where accurate, fact-based information is important.
It is therefore important to examine the quality of the responses that ChatGPT provides in these topic areas, according to the researchers. Among other things, it is important to separate misinformation in public debate and the media from scientifically based findings.
Hallucinations and useless assumptions
This is not easy. To make matters worse, AI can “hallucinate”. In other words, ChatGPT makes factual claims that cannot be substantiated by any source. Furthermore, the linguistic model tends to “make meaningless assumptions rather than reject unanswered questions,” according to the TU team.
The big danger is that ChatGPT users take incorrect or incorrect answers at face value because they are phrased plausibly and semantically correct. Previous research showed that people gave more weight to the AI’s advice if they were unfamiliar with the topic being discussed, had used ChatGPT before, and received accurate advice from the model, the researchers write.
The Berlin team is particularly interested in the topic because, with the Green Consumption Assistant research project, they are developing an AI-powered assistant that supports consumers in making more sustainable purchasing decisions online. Previous research has only shed light on ChatGPT’s possibilities, but does not reflect its ability to answer questions about climate change, the researchers write.
To clarify this, they asked ChatGPT a total of 95 questions. They evaluated responses for accuracy, relevance, and consistency. The team checked the quality of responses using reliable, public sources of information on climate change, such as the current Intergovernmental Panel on Climate Change (IPCC) report.
Mostly high-quality responses
The researchers took into account that the language model is constantly developing. Among other things, they checked whether an input (prompt) produced different results at different times. The first round was conducted last February using ChatGPT-3.5, while the second set of questions was conducted in mid-May this year using the subsequent version of the model. Its knowledge base was recently updated and now extends until April 2023. Previously, the model only had information until September 2021.
Therefore, the results may be different today. For follow-up studies, the researchers suggest more rounds of questioning at shorter intervals. Researchers see other limitations to their work in the possibly too small number of experts to evaluate responses. Furthermore, the questions and their formulation were not based on current user data. People today could ask ChatGPT different questions, phrased in different ways, that would produce different results.
The research work now published has shown that the quality of the model’s responses is generally high. On average, it was rated 8.25 out of 10 points. “We observed that ChatGPT provides balanced and nuanced arguments and concludes many responses with a comment that encourages critical review to avoid biased responses,” says Maike Gossen from TU Berlin. For example, ChatGPT answered the question “How is marine life affected by climate change and how can negative impacts be reduced?” not just the aforementioned reduction in greenhouse gas emissions – but also?
Reduce the non-climate impacts of human activities, such as overfishing and pollution.
Relevant error rate
The accuracy of more than half of the answers was rated at 10. But you shouldn’t trust that the results will always be so high. In 6.25 percent of the responses the precision did not reach more than 3 points and in 10 percent the relevance did not reach a value greater than 3.
Of the questions answered inaccurately, the most common error was caused by hallucinations of facts. For example, ChatGPT’s answer to the question “What percentage of recyclable waste is actually recycled by Germany?” Correct in broad strokes, but not in details. According to the Federal Environment Agency, it was 67.4% in 2020, while ChatGPT said 63%.
ChatGPT is inventive but looks believable
In some cases, ChatGPT generated false or false information, such as fabricated references or false links, including to purported articles and contributions in scientific publications. Other errors arose in cases where ChatGPT cited specific and correct scientific sources or literature but drew incorrect conclusions from them.
The researchers were also able to observe that ChatGPT’s inaccurate responses were phrased so plausibly that they were incorrectly perceived as correct. “Because text generators like ChatGPT are trained to provide answers that seem correct to people, the confident answer style can mislead people into believing the answer is correct,” says Maike Gossen.
The team also found misinformation in social discourse or bias. For example, some of ChatGPT’s incorrect responses reflected misunderstandings about effective action against climate change. This includes the overestimation of individual behavioral changes, but also individual measures with little impact that slow down structural and collective changes with greater impact. At times, the responses also seemed overly optimistic about technological solutions as a key way to mitigate climate change.
Valuable but fallible source
Large language models like ChatGPT could be a valuable source of information about climate change, scientists conclude. However, there is a risk of spreading and promoting false information about climate change, as it already reflects outdated facts and misunderstandings.
Their brief study shows that verifying sources of environmental and climate information is more important than ever. However, recognizing incorrect answers often requires detailed expert knowledge in the relevant subject area, especially because they seem plausible at first glance.