To improve large-scale educational assessments, a student's test-taking behaviors should be disentangled from his ability, say researchers in a Policy Forum in the April 23 issue of Science. Both performance-oriented factors should be reported, they add. Otherwise, interventions targeted to individual students will be less successful and country-level education rankings may not be fair.
"We claim that test-taking behaviors are relevant to scoring performance," said author Steffi Pohl, a professor in the department of education and psychology at Freie Universität in Berlin, Germany. "In later work life, it will surely matter whether you are fast or slow in solving your tasks and whether you are able to finish all of them. As international comparisons aim at measuring competences to meet real-life challenges, behavioral aspects related to how well these challenges can be met should be included."
Pohl and her colleagues say that the approach they propose would fundamentally change how performance scores are interpreted.
"With the previous way, we only know which level a country or a student achieved," said Pohl. "With our proposed approach, we can better understand due to which aspects of performance we observed a given result."
For example, a rather low score may be due to lack of knowledge, lack of speed, not wanting or managing to respond to all tasks, or a combination of these aspects.
One of the better-known large-scale education tests used throughout the world today is the Programme for International Student Assessment, or PISA. Results from PISA assessments are used in education policy interventions as well as in comparison of competencies across nations.
Currently, reporting practices for tests like PISA do not account for the impact of performance-oriented behaviors among test takers. "Policy makers and educators have only focused on the overall performance score," said Pohl. And while there is a long history of researchers trying to understand test-taking behaviors, "these behaviors have often been seen as nuisance that needs to be corrected," said Pohl, "not as valuable information on the performance of the student."
This practice threatens the fairness of country-scale rankings, especially as test-taking behavior seems to be a cultural issue. "In some countries, students work fast, usually responding to almost all tasks, while in other places, students take more time and some also skip a lot of questions," Pohl explained. "In the current reporting practice, this is not taken into account."
By not addressing performance-oriented behaviors, educators could miss opportunities to improve interventions for individual students, Pohl said. For example, if a student responds correctly to the items he approaches, but does not manage to work on all items, a teacher does not necessarily need to improve knowledge, as the student did show that he had it. Instead, the teacher who understands this behavior should focus on teaching time allocation strategies.
A PISA Test Case
In an exercise that further illustrates the importance of disentangling performance-oriented factors, Pohl and her colleagues analyzed data on a 2018 PISA math literacy test taken by more than 7,000 students in Australia, more than 3,000 students in Switzerland, and more than 6,000 students in Italy.
"We chose these countries because each country has different strengths, so that helped to demonstrate how considering different aspects of test performance may impact conclusions," said Pohl.
The authors applied a model they had previously developed called the speed-accuracy + omission (SA+O) model, which, besides accounting for a student's ability, aims to account for not reaching items on a test owing to time limits and item omissions.
Based on their analysis, Italian and Swiss students showed a higher ability than Australian students, but they achieved this by taking more time and responding to fewer tasks. Australian students took on a more efficient approach by answering most of the items in rather little time, sacrificing accuracy.
Under current approaches, analysts "would (mis-)interpret the scores as ability scores and would not be able to see that there is variation in test-taking behavior," the authors write in their Policy Forum. They would rank Switzerland and Italy higher than Australia, neglecting differences in time taken and items omitted.
"Disentangling and reporting different aspects of performance allow for — what we believe — a fairer comparison of groups," the authors write.
They propose the use of models like the SA+O model, though they note it is just one example. "There are other models that incorporate other aspects of test-taking behavior, such as guessing or quitting on the test," said Pohl. "The model should be chosen based on what types of test-taking behavior should be considered" in a given situation, she noted.
The authors said that data from all countries participating in PISA are analyzed currently by the Educational Testing Service — the world's largest private nonprofit educational testing and assessment organization — in Princeton, New Jersey. "We can imagine that our proposed approach could be made part of these initial analyses," said Pohl. "This may open up a discussion on what we actually want to measure and what the PISA results tell us."