According to Business Insider, NYU Stern professor Panos Ipeirotis ran an experiment using AI to administer and grade oral exams for his data science students. Concerned that brilliant-looking written assignments masked a lack of real understanding, he built an AI examiner using ElevenLabs’ speech tech. Over nine days, the system assessed 36 students in 25-minute sessions, with total compute costs of just $15. He then used a “council of LLMs”—Claude, Gemini, and ChatGPT—to grade the transcripts, with Claude acting as the chair to synthesize a final score. Ipeirotis claims the AI graded more consistently and fairly than humans and provided superior feedback. However, only a small minority of students preferred this method, with many finding it more stressful than written tests.
The AI Examiner Is In
Here’s the thing: this experiment is a fascinating, pragmatic response to a massive problem. Students are using AI to write papers that sound expert but are hollow. Ipeirotis’s solution? Fight fire with fire. If AI can generate the essay, maybe AI can also be the stern professor grilling you on it. The scalability argument is powerful. Oral exams were the gold standard for centuries because they test real-time reasoning. We abandoned them because they’re labor-intensive. Now, for $15, you can assess a whole class. That’s a game-changer on paper.
But What Are We Really Measuring?
But let’s pump the brakes a bit. Is an AI-graded oral exam measuring understanding, or is it measuring a student’s ability to perform for an AI? These models are looking for patterns, keywords, and structures in the transcript. A savvy student might learn to “talk to the bot” rather than deeply engage with the material. And the stress factor is huge. The professor says only a small minority preferred it. If students are more anxious, does that create a better assessment environment, or just a different, potentially inequitable one? Some people freeze in live exams, AI or not.
The Wicked Problem of AI Assessment
This experiment highlights the core “wicked problem” universities face. As the article notes, faculty are totally divided. Some see AI as a tool to master, others as pure cheating. Nobody agrees on what an “AI-proof” test even looks like. Ipeirotis’s method is one attempt, but it’s essentially an arms race. It feels like we’re trying to out-engineer the problem. And what happens when students start using real-time AI *during* the oral exam? You can bet someone’s working on an AI co-pilot that whispers answers through an earpiece. The cat-and-mouse game continues.
A Glimpse of a Weird New Normal
So where does this leave us? I think this is a glimpse into a strange new educational normal. The middle ground—the take-home essay—might be dead. Assessments will likely polarize: either fully proctored, in-person written exams, or these kinds of AI-mediated interactive tests. The promise of hyper-personalized, AI-driven tutoring and examination is real. But the risk is that we optimize for what the AI can measure, not what’s most important for humans to learn. The tech is advancing faster than our pedagogy. Basically, we’re building the plane while flying it, and the passengers are getting airsick. It’s a bold experiment, but it’s far from a final answer.
