Education resource site EduRef has tried to find out — by testing the system’s essay-writing skills. The company hired a panel of professors to create writing prompts for essays on US history, research methods, creative writing, and law. They fed the prompts to GPT-3, and also gave them to a group of recent college graduates and undergrad students. The anonymized papers were then marked by the panel, to test whether AI can get better grades than human pupils. Some of the results could unnerve professors — and excite unscrupulous students. But others showed GPT-3 still has a lot to learn. [Read: How do you build a pet-friendly gadget? We asked experts and animal owners] GPT-3’s highest grades were B-minuses for a history essay on American exceptionalism and a policy memo for a law class. Its human rivals earned similar marks for their history papers: a B and a C+. But only one of three students got a higher grade than the AI for the law assignment. GPT-3 also received a solid C for its research methods paper on COVID-19 vaccine efficacy, while the students got a B and a D. However, the AI’s creative writing abilities couldn’t match its technical skills. Its story received the model’s solitary fail, while the student writers’ grades ranged from A to D+. Overall, GPT-3 showed an impressive grasp of grammar, syntax, and word frequency. But it failed to craft a strong narrative for the creative writing assignment. Project manager Sam Larson told TNW that this could be due to how GPT-3 recalls information: Still, what GPT-3 lacked in craft it made up for in speed. The model spent between three and 20 minutes generating content for each assignment, while the humans took three days on average.
Assessing the assessment
EduRef stressed that the experiment was only an exploratory study. GPT-3’s outputs were lightly edited for length and repetition, although its content, factual information, and grammar were left untouched. In addition, the AI produced two papers for the history, research, and law assignments. Larson then picked which ones to use: Larson said the creative writing task required additional human interference: Larson — who is himself an academic — was nonetheless impressed by GPT’s performance. He hopes that this type of AI-generated content gives instructors and policy-makers pause for thought about how they quantify what makes a successful student. But students may be more interested in AI’s ability to lend them a devious helping-hand.