ChatGPT Performs Comparably to Human Therapists in Study

Artificial Intelligence in Psychotherapy: ChatGPT Passes Turing Test in Study

A recent study is causing a stir: Participants could hardly distinguish between therapeutic responses from ChatGPT and human therapists. The AI even performed better than human experts in some areas.

The Turing Test in a Therapeutic Context

The Turing Test, developed by computer scientist Alan Turing, is used to determine whether a machine can imitate human behavior so well that a person can no longer distinguish whether they are interacting with a machine or another human. In this study, this concept was applied to psychotherapy. 830 participants were asked to differentiate between the responses of ChatGPT and those of human therapists.

The research, published in PLOS Mental Health, shows that the participants performed only slightly better than pure chance. They correctly identified the responses of human therapists in 56.1 percent of cases and those of ChatGPT in 51.2 percent. The researchers examined 18 case studies from couples therapy and compared the responses of 13 experienced therapists with those of ChatGPT.

Surprising Results: ChatGPT Convinces in Empathy and Cultural Competence

The study found that ChatGPT performed better than the human experts in some aspects of therapeutic quality. The AI scored higher in the areas of therapeutic alliance, empathy, and cultural competence. Several factors contributed to this strong performance. ChatGPT consistently generated longer responses with a more positive tone and used more nouns and adjectives. These characteristics made the responses appear more detailed and empathetic.

Bias Against AI Influences Perception

The research also uncovered an important bias: When participants believed they were reading AI-generated responses, they rated them lower – regardless of whether they were actually written by humans or ChatGPT. This bias worked both ways: AI-generated responses received the highest ratings when participants mistakenly attributed them to human therapists.

Methodological Limitations and Outlook

The researchers acknowledge that their work has methodological limitations. The study was based on short, hypothetical therapy scenarios and not on real therapy sessions. It is also questionable whether the results from couples therapy are equally transferable to individual counseling.

Nevertheless, the researchers emphasize that mental health professionals need to understand these systems, as there is increasing evidence for the potential of AI in therapeutic settings and its likely future role in mental health care. They stress that responsible clinicians must carefully train and monitor AI models to ensure high standards of care.

AI in a Therapeutic Context: Further Research Findings

This study is not the first to demonstrate the capabilities of AI in advisory roles. Research from the University of Melbourne and the University of Western Australia has shown that ChatGPT provided more balanced, comprehensive, and empathetic advice on social dilemmas compared to human advisors, with preference rates between 70 and 85 percent.

A study from April 2023 found that people perceived AI responses to medical diagnoses as more empathetic and higher quality than those from doctors. ChatGPT has also demonstrated exceptional emotional intelligence, scoring 98 out of 100 on the standardized Levels of Emotional Awareness Scale (LEAS) test – far exceeding typical human scores of 56 to 59.

Despite these results, researchers from Stanford University and the University of Texas urge caution in the application of ChatGPT in psychotherapy. They argue that large language models lack a true "Theory of Mind" and cannot experience genuine empathy. They call for an international research initiative to develop guidelines for the safe integration of AI into psychology.