AI can pass the Turing Test in live chats, and the latest result lands with a chill. In a UC San Diego study, GPT-4.5 outperformed real participants at convincing judges there was a person on the other side.
The setup was harder to shrug off than a standard benchmark. Judges reacted to real-time exchanges rather than static prompts, then made a fast call based on conversation alone.
The unsettling part is how familiar the skill looks. The model didn’t need a body, a voice, or a biography. It only needed to sound like someone.
How did AI beat the human test
The study used a three-party version of the test. Judges chatted with both a person and an AI model, then chose which one they thought was real.
GPT-4.5 was identified as human 73% of the time when it was given a persona prompt. LLaMa-3.1-405B also crossed a striking line, getting picked as human 56% of the time with a persona prompt.
Those numbers give the finding its bite. The model didn’t merely avoid detection, it gave judges enough social cues to read it as the person in the chat.
Why does this test still matter
The Turing Test is a decades-old way to ask whether a machine can imitate human conversation well enough to fool a person. In the classic version, an evaluator chats without seeing the participants, then tries to tell the human apart from the machine.
It has always been more cultural symbol than clean measurement. Still, it remains the test people recognize when they want to know whether software can pass for one of us.

That makes the new result feel sharper. A chatbot doesn’t need consciousness, emotion, or self-awareness to create the impression that a real person is typing back. It only needs to be believable in the moment.
The risk shows up in ordinary places. Customer support, dating apps, social platforms, education, and political messaging all rely on quick judgments about identity, intent, and authenticity.
What should we watch next
The study stops well short of saying chatbots understand people. Its more practical finding is that some models can now perform personhood extremely well in short exchanges.
Clearer disclosure should become the next pressure point. When a bot can blend into casual conversation, users need stronger signals that they’re dealing with software, especially in places where persuasion or emotional vulnerability shapes the exchange.
The next fight is over labeling in chats where people make fast decisions about trust.






