Scientific summaries produced by ChatGPT AI often mislead researchers

⇧ [VIDÉO] You may also like this partner content (post ad)

OpenAI’s new AI language continues to raise some concerns. After demonstrating that it can write sufficiently persuasive essays – allowing students to “cheat” – and write its own scientific publication in just two hours, this AI poses a new ethical problem: it can create fake summaries of research papers. , which scientists cannot always distinguish from original summaries. This may compromise the integrity and accuracy of the search.

Launched in November 2022, ChatGPT can generate realistic and intelligent text in response to user suggestions, regardless of topic. To do this, it relies on large amounts of human-generated text on which its designers train a neural network. Language models like these are now so sophisticated that they produce increasingly realistic texts that are sometimes hard to distinguish from human-written texts.

A recent study is available bioRxiv showed that it was “surprisingly difficult to tell the difference between the two” for human examinees. ” I `m very anxious. If we are now in a situation where experts cannot determine what is true or not, we are losing a much-needed mediator to help us with complex issues. said Sandra Wachter, who studied technology and regulation at Oxford University.


Show the world your passion for space and your support for the fight against global warming.

Original, coherent and convincing summaries

To assess how “stable” these artificial texts are, a team led by Catherine Gao of Northwestern University in Chicago asked ChatGPT to generate summaries of 50 medical research articles from five well-known journals.JAMA, New England Journal of Medicine, BMJ, Lancet and Nature Medicine). Request sent to model ” Write a scientific abstract for the article [titre] in style [journal] “.

Prepared abstracts were evaluated using artificial intelligence output detector and plagiarism detector (giving 0-100% originality score). The researchers also asked human reviewers to identify a corpus of 25 abstracts generated by ChatGPT.

The team also verified that the format of the ChatGPT-generated abstracts met the journal’s requirements by comparing the format with the titles and structure of the original article, then compared the sizes of the patient cohorts reported between the original abstracts. and was created. To begin with, only 8 abstracts (or 16%) correctly used journal-specific titles. The size of the patient cohorts was of similar magnitude between the original abstracts and the artificial abstracts. ” It was impressive that with just a title and annotation, ChatGPT was able to generate a surface-readable summary with precise topics and subject-specific patient cohort sizes. researchers write.

In addition, ChatGPT was able to show originality: almost all summaries generated were considered completely original by the plagiarism checker, with a median originality score of 100%. The original abstracts were also subjected to this check: they showed a median originality score of 38.5% – the source of the “plagiarism” is the article to which they are systematically linked, which is quite normal for a research abstract.

Two-thirds of artificial summaries were detected by AI detector and humans

The AI ​​output detector performed quite well: two-thirds of the summaries generated by ChatGPT detected a high probability of artificial content, while the original summaries showed a very low probability. However, the tool is not flawless: Of the summaries generated, 17 (34%) scored below 50% from the AI ​​output detector, including 5 below 1%. “, – reports the team.

All but one of the original summaries scored extremely low on the AI ​​Output Detector. Most of the artificial summaries scored high, but 17 of them (34%) scored below 50%. © C. Gao et al.

Human reviewers perform similarly: they correctly identify 68% of artificial abstracts generated by ChatGPT and 86% of original articles as written by humans. But this still means that 32% of the fictional summaries were considered human-made, and 14% of the original summaries were mistaken for fictional texts… These reviewers noted that the summaries they considered to be fictional were generally “perfunctory and vague.”

AI detection scores for fictitious summaries did not differ between summaries that human reviewers identified as fictitious and those that did not. © C. Gao et al.

Humans and AI output detectors can therefore identify most of the content generated by ChatGPT, but none has been proven to be infallible. The authors of the study are concerned about the unethical use of this technology. ” Given its ability to create summaries with convincing numbers, it can be used to completely falsify research. “, they note. ” This may mean that research-based policy decisions are incorrect “, adds Sandra Wachter. The consequences are even greater in areas such as medical research, where misinformation can endanger people’s safety.

But at the same time, the researchers admit, its use can also be seen as a welcome aid to “reduce the burden of writing and formatting” or to scientists to publish “in a non-native language”. Therefore, Gao and his colleagues suggest that when the text is so, for example, by citing between authors, it should be clearly stated that the text was written with ChatGPT. However, the boundaries of the ethical and acceptable use of large language models to aid scientific writing remain undefined, they conclude.

Source: Nature

Leave a Reply

Your email address will not be published. Required fields are marked *