We are still in the early days of understanding the promise and peril of using generative AI in education. Very few researchers have evaluated whether students are benefiting, and one well-designed study showed that using ChatGPT for math actually harmed student achievement.
The first scientific proof I’ve seen that ChatGPT can actually help students learn more was posted online earlier this year. It’s a small experiment, involving fewer than 200 undergraduates. All were Harvard students taking an introductory physics class in the fall of 2023, so the findings may not be widely applicable. But students learned more than twice as much in less time when they used an AI tutor in their dorm compared with attending their usual physics class in person. Students also reported that they felt more engaged and motivated. They learned more and they liked it.
A paper about the experiment has not yet been published in a peer-reviewed journal, but other physicists at Harvard University praised it as a well-designed experiment. Students were randomly assigned to learn a topic as usual in class, or stay “home” in their dorm and learn it through an AI tutor powered by ChatGPT. Students took brief tests at the beginning and the end of class, or their AI sessions, to measure how much they learned. The following week, the in-class students learned the next topic through the AI tutor in their dorms, and the AI-tutored students went back to class. Each student learned both ways, and for both lessons – one on surface tension and one on fluid flow – the AI-tutored students learned a lot more.
To avoid AI “hallucinations,” the tendency of chatbots to make up stuff that isn’t true, the AI tutor was given all the correct solutions. But other developers of AI tutors have also supplied their bots with answer keys. Gregory Kestin, a physics lecturer at Harvard and developer of the AI tutor used in this study, argues that his effort succeeded while others have failed because he and his colleagues fine-tuned it with pedagogical best practices. For example, the Harvard scientists instructed this AI tutor to be brief, using no more than a few sentences, to avoid cognitive overload. Otherwise, he explained, ChatGPT has a tendency to be “long-winded.”
The tutor, which Kestin calls “PS2 Pal,” after the Physical Sciences 2 class he teaches, was told to only give away one step at a time and not to divulge the full solution in a single message. PS2 Pal was also instructed to encourage students to think and give it a try themselves before revealing the answer.
Unguided use of ChatGPT, the Harvard scientists argue, lets students complete assignments without engaging in critical thinking.
Kestin doesn’t deliver traditional lectures. Like many physicists at Harvard, he teaches through a method called “active learning,” where students first work with peers on in-class problem sets as the lecturer gives feedback. Direct explanations or mini-lectures come after a bit of trial, error and struggle. Kestin sought to reproduce aspects of this teaching style with the AI tutor. Students toiled on the same set of activities and Kestin fed the AI tutor the same feedback notes that he planned to deliver in class.
Kestin provocatively titled his paper about the experiment, “AI Tutoring Outperforms Active Learning,” but in an interview he told me that he doesn’t mean to suggest that AI should replace professors or traditional in-person classes.
“I don’t think that this is an argument for replacing any human interaction,” said Kestin. “This allows for the human interaction to be much richer.”
Kestin says he intends to continue teaching through in-person classes, and he remains convinced that students learn a lot from each other by discussing how to solve problems in groups. He believes the best use of this AI tutor would be to introduce a new topic ahead of class – much like professors assign reading in advance. That way students with less background knowledge won’t be as behind and can participate more fully in class activities. Kestin hopes his AI tutor will allow him to spend less time on vocabulary and basics and devote more time to creative activities and advanced problems during class.
Of course, the benefits of an AI tutor depend on students actually using it. In other efforts, students often didn’t want to use earlier versions of education technology and computerized tutors. In this experiment, the “at-home” sessions with PS2 Pal were scheduled and proctored over Zoom. It’s not clear that even highly motivated Harvard students will find it engaging enough to use regularly on their own initiative. Cute emojis – another element that the Harvard scientists prompted their AI tutor to use – may not be enough to sustain long-term interest.
Kestin’s next step is to test the tutor bot for an entire semester. He’s also been testing PS2 Pal as a study assistant with homework. Kestin said he’s seeing promising signs that it’s helpful for basic but not advanced problems.
The irony is that AI tutors may not be that effective at what we generally think of as tutoring. Kestin doesn’t think that current AI technology is good at anything that requires knowing a lot about a person, such as what the student already learned in class or what kind of explanatory metaphor might work.
“Humans have a lot of context that you can use along with your judgment in order to guide a student better than an AI can,” he said. In contrast, AI is good at introducing students to new material because you only need “limited context” about someone and “minimal judgment” for how best to teach it.
Contact staff writer Jill Barshay at (212) 678-3595 or [email protected].
This story about an AI tutor was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.