Tag Archives: TOEFL

Why language is best assessed by real people


“Classroom decoration 18” by Cal America is licensed under CC BY 2.0

What is the most effective way to assess English learners’ proficiency?

It has become accepted in the field to rely on psychometric tests such as the iBT (Internet-Based TOEFL) and the IELTS for college and university admissions. Yet these and most other language tests are an artifice, a device that is placed between the student’s actual proficiency and direct observation of that proficiency by a real human being. Students complete the limited set of tasks on the test, and based on the results, an algorithm makes an extrapolation as to their broader language abilities.

When you look at a TOEFL score report, it does not tell you that student’s English language ability; what it tells you is what a learner with that set of scores can typically do. And in the case of the TOEFL, this description is an evaluation that is based largely on multiple choice answers and involved not one single encounter with an actual human being. Based on this, university admissions officers are expected to make an assumption about the student’s ability to handle the demands of extensive academic reading and writing, classroom participation, social interaction, written and spoken communications with university faculty and staff, SEVIS regulations, and multiple other demands of the U.S. college environment. (Although the IELTS includes interaction with the examiner and another student, these interactions are highly structured and not very natural. TOEFL writing and speaking tasks are limited, artificial, and assessed by a grader who has only a text or sound or text file to work with.)

Contrast that with regular, direct observation of students’ language proficiency by a trained and experienced instructor, over a period of time. The instructor can set up a variety of language situations involving variation in interlocutors, contexts, vocabulary, levels of formality, and communication goals. In an ACCET or CEA accredited intensive English program, such tasks are linked to documented learning objectives. By directly observing students’ performance, instructors are able to obtain a rich picture of each student’s proficiency, and are able to comment specifically on each student’s strengths and weaknesses.

Consider this a call, then, for colleges and universities to enter into agreements with accredited intensive English programs to waive the need for a standardized test such as the TOEFL. Just as those colleges and universities don’t use a standardized test to measure the learning of their graduates, they should be open to accepting the good judgment of teachers in intensive English programs – judgment based on direct observation of individual learners rather than the proxy scores obtained by impersonal, artificial tests.

Goodhart’s Law and the Measurement of English Proficiency

Goodhart’s Law was first proposed by the British economist Charles Goodhart. In essence it states that, “When a measure becomes the goal, it ceases to be a good measure.” Measurements are often used as a proxy for performance. For example, it’s sometimes reported that in Soviet Russia, when the success of nail production was measured by quantity of nails, many tiny nails were produced. When it was measured using weight of nails, smaller numbers of large nails were produced. The measure became the target, and gaming the system created the illusion of success.

In U.S. university admissions, the TOEFL is the most common measure of the English proficiency of international applicants. It’s easy to understand why the complexity of language proficiency needs to be reduced to a small set of numbers when large quantities of applications have to be evaluated. Unfortunately, TOEFL preparation is very often a great example of Goodhart’s Law in action: many students focus on attaining the necessary score rather than comprehensively working on the cognitive-academic language skills and cultural skills they need to succeed in the U.S. university, and this can result in serious challenges for those students. Once matriculated, as those students seek to earn good grades – a proxy measure for learning – they may wind up trying to game the system by plagiarizing, using online essay services, cramming at the last minute, or begging the instructor for a better grade.

Although it would present practical difficulties, it would serve everyone better – schools and students – if the schools used a broader set of mechanisms to determine English proficiency. These might include evidence of English (not just test prep) study, Skype, phone, and in-person interviews, recorded presentations by applicants, synchronous online discussion groups, and reports from instructors in intensive English programs who have first-hand – not proxy – knowledge of the students’ English.