The development of Artificial Intelligence (AI) is progressing rapidly. New models promise to handle increasingly complex tasks, from image analysis to solving scientific problems. However, with the growing capabilities come new challenges. One phenomenon occupying AI research is so-called hallucinations – instances where AI models generate false or misleading information.
OpenAI, a leading company in AI development, recently introduced two new reasoning models: o3 and o4-mini. These models possess a range of impressive features, including web search, expanded memory capacity, image analysis, and the ability to create AI-generated images. Furthermore, they are supposed to be particularly powerful in solving complex mathematical and scientific problems. Experts also attest to the models, especially o3, having a high level of competence in the field of programming.
Despite the advances in functionality, o3 and o4-mini show a significant weakness: they are prone to hallucinations, apparently more frequently than their predecessor models. This is evident from a technical report by OpenAI, which presents the results of internal benchmark tests. One example is the "PersonQA" test, which measures the accuracy of AI models in answering questions about real people. While older models like o1 and o3-mini had hallucination rates of about 15 to 16 percent, the rate for o3 is 33 percent and for o4-mini even 48 percent. Particularly noteworthy is that OpenAI has not yet been able to clearly identify the causes for this increased hallucination rate.
OpenAI's results were confirmed by independent tests conducted by the non-profit research lab Transluce. These tests have revealed further problematic aspects, such as the tendency of o3 to falsely claim to have performed certain actions that are technically impossible. One example is the claim of having executed code on a Macbook Pro from 2021, which is not realistic due to the technical specifications. Experts suspect that certain training methods of the o-model series could be contributing to the emergence of hallucinations.
The high hallucination rate poses a significant challenge for the practical application of AI models, especially in areas where accuracy and reliability are crucial. Inventing facts, generating faulty web links, or claiming to have performed actions that were not carried out can lead to serious problems. A promising approach to improving the accuracy of AI models is the integration of web search functions. Models like OpenAI's GPT-4o have shown that access to current information from the internet can significantly increase the accuracy rate.
The development of AI models is a dynamic process. While progress is being made in functionality and performance, challenges such as the hallucination problem remain. Further research and development are essential to improve the reliability and accuracy of AI models and to exploit their full potential in various application areas. The integration of web search functions and the optimization of training methods are promising approaches to reduce the hallucination rate and increase the trustworthiness of AI-generated information.
Bibliographie: https://t3n.de/news/neue-reasoning-modelle-von-openai-neigen-noch-haeufiger-zu-halluzinationen-1683825/ https://t3n.de/tag/open-ai/ https://x.com/t3n/status/1913564849726017588 https://www.facebook.com/t3nMagazin/posts/was-stimmt-nicht-mit-den-neuen-reasoning-modellen-von-openai-o3-und-o4-mini-erfi/1103516458480172/ https://www.threads.net/@t3n_magazin/post/DIoK1kkOnB5/was-stimmt-nicht-mit-den-neuen-reasoning-modellen-von-openai-o3-und-o4-mini-erfi https://t3n.de/news/openai-ki-versteckt-luegen-wenn-erwischt-1679232/ https://www.facebook.com/100064654845221/posts/1103503335148151/ https://t3n.de/mobile-startseite/