Tag Archives: TOEFL

The jury trial and the language test

Background vector created by macrovector – http://www.freepik.com

I recently had the opportunity to serve on a jury in a case in which a young man was accused of ‘operating under the influence,’ i.e. driving a car while under the influence of alcohol. There was only one witness, the police officer who arrested the young man. In order to establish that the young man was under the influence of alcohol after pulling him over, the officer had conducted what is known as a field sobriety test, which usually has three components:

  1. The horizontal gaze nystagmus test, in which a person’s eyes have to track a moving object.
  2. The walk-and-turn test, in which the person walks heel-to-toe for nine steps, pivots on the left foot and walks back heel-to-toe for nine steps.
  3. The one-leg stand test, in which the person stands on one leg and counts to 30.

Each test has its criteria for success and failure. For example, you may fail the one-leg stand test if you have to use your arms to balance yourself, or if you put your foot down before counting to 30.

In the jury room, we six jury members found that the prosecutor’s evidence was based almost entirely on the results of the field sobriety test, which the young man failed. We had been instructed by the judge that we should presume innocence and find the man not guilty unless the prosecution could prove ‘beyond reasonable doubt’ that the young man had been driving under the influence of alcohol.

We jury members were not permitted to do any research on the field sobriety test, and none of us had any information about how useful it was, though we knew that it is widely used. Most jurors had doubt about the evidence, but one juror believed we should base our verdict on the result of the field sobriety test.

I found myself in the position of explaining that we cannot assume that tests give us useful information and I used the notions of validity and reliability to explain it. In this case:

  1. Is the field sobriety test valid? Does it measure what it purports to measure? (Note that the officer conducted only the walk-and-turn test and the one-leg stand test.) Were there other reasons why the young man – or anyone – might fail these tests other than being under the influence of alcohol? (It was claimed by the defense that the young man had ‘an issue’ with his left leg.)
  2. Is the field sobriety test reliable? Can we be sure another officer would have reached the same conclusion based on the young man’s performance on the test?

Given these questions, could we determine ‘beyond reasonable doubt’ that the young man had been under the influence of alcohol?

I am writing about this because in language teaching, we tend to place a lot of trust in the tests we use, whether made by an individual teacher, a program, or an international testing organization. Any really useful test should help us determine whether the test taker has attained certain knowledge or is capable of performing certain functions in the language. Think about any test you know – the TOEFL, TOEIC, IELTS, or those home-grown tests used in your program. To what extent are you able to say ‘beyond reasonable doubt’ that the person taking the test has acquired certain knowledge or skills? We know for sure that plenty of students with high TOEFL scores are not well-prepared linguistically to succeed on an American college campus, and this is a test that has a huge research effort behind it.

So I would ask that you advocate for more evidence than a single test to determine students’ language proficiency. Decision-makers such as college admissions staff should seek multiple sources of evidence that converge on a conclusion, not just a single test. They should have more than one person or entity reviewing students’ language ability. We should maintain a healthy doubt about the tests we use.

What was the verdict in this case? In the end, no matter what the truth was, we could not return a guilty verdict because the state had not proved beyond reasonable doubt that the young man had operated a motor vehicle under the influence of alcohol.

Naturally I rushed to my computer after the trial and googled ‘reliability of the field sobriety test.’ What do you think I found?

The Inadequacy of “ESL” for International Student Preparation

Wrapped up in the term ESL (English as a Second Language) is an assumption that language, above all, is what students need to succeed in an English-speaking environment. The same kind of assumption can be found in the name of the most popular standardized U.S. admissions test for international students, the TOEFL (Test of English as a Foreign Language). The CEFR (Common European Framework of Reference for Languages) lists levels of language proficiency by skill, and many ESL programs continue to organize their curricula on the basis of Listening, Speaking, Reading, and Writing skills. The field of SLA (Second Language Acquisition) is a major feeder discipline in ESL teacher preparation programs.

A focus on the acquisition of language skills gets us only so far if we are preparing an international student for academic work in an English-speaking setting.  One thing among very many that this student needs to do is to read a text critically and offer an original, well-thought-out, supported, and argued response. The student may need to argue that response in class, and defend it against other points of view, in an assertive yet diplomatic manner. To be taken seriously, the student will need to behave in what is recognized as a normal and appropriate manner in that environment – and know when and how to revert to a more informal style when class ends. All of this goes far beyond language skills.

What this student needs to learn is what James Paul Gee in Social Linguistics and Literacies refers to as Discourse (with a capital D). Discourse “is composed of distinctive ways of listening/speaking and often, too, writing/reading coupled with distinctive ways of acting, interacting, valuing, feeling, dressing, thinking, believing with other people and with various objects, tools, and technologies, so as to enact specific socially recognizable identities engaged in specific socially recognizable activities” (p. 152). These are less language skills than “social practices into which people are apprenticed as part of a social group” (p. 76). As we move in different Discourse communities, we need to know how to play our part and be recognized as a legitimate member of each community. Discourses are mastered by “enculturation…into social practices through scaffolded and supported interaction with people who have already mastered the Discourse” (p. 168).

This helps us understand why any program of learning that reduces preparation to language skills is inadequate. Students need to learn the ways of interacting, believing, valuing, and effectively being in the academic Discourse community. University IEPs (intensive English programs) teach English for academic purposes, but they still largely identify as English language programs with language-based missions, their faculty members have degrees in teaching English, and classes are often language skill-specific. They are often isolated from the rest of the campus, and therefore don’t allow for the kind of apprenticeship into the social practices of the campus that would make international students full members of the Discourse community.

In order to address this wider understanding of international student preparation:

  • Intensive English programs should ensure their missions, their curricula and teaching, and their names, encapsulate the full meaning of international student preparation – not simply ESL.
  • University administrations should make international student preparation a task for the whole university, supported by, but not the sole responsibility of, an intensive English program. The IEP’s efforts should be integrated into a campus-wide strategy for international student preparation.
  • Universities should not expect that simply raising the required TOEFL scores will improve international student outcomes – students need induction into the Discourse community, not just a higher TOEFL score.
  • ESL teacher preparation programs need to include coursework on social literacy and in preparing students to enter and successfully navigate their target Discourse communities.

Some of this has already been achieved. Many IEPs recognize their wider mission of orienting students into academic culture, and more recently,  pathway programs have been structured to provide ESL support alongside credit-bearing classes that, in theory at least, offers an apprenticeship into the academic community. But there is a long way to go before the notion of Discourse communities drives international student preparation beyond the inadequacy of “ESL.”

Reference
Gee, J.P., Social Linguistics and Literacies, 5th Ed., Routledge 2015

Why language is best assessed by real people


“Classroom decoration 18” by Cal America is licensed under CC BY 2.0

What is the most effective way to assess English learners’ proficiency?

It has become accepted in the field to rely on psychometric tests such as the iBT (Internet-Based TOEFL) and the IELTS for college and university admissions. Yet these and most other language tests are an artifice, a device that is placed between the student’s actual proficiency and direct observation of that proficiency by a real human being. Students complete the limited set of tasks on the test, and based on the results, an algorithm makes an extrapolation as to their broader language abilities.

When you look at a TOEFL score report, it does not tell you that student’s English language ability; what it tells you is what a learner with that set of scores can typically do. And in the case of the TOEFL, this description is an evaluation that is based largely on multiple choice answers and involved not one single encounter with an actual human being. Based on this, university admissions officers are expected to make an assumption about the student’s ability to handle the demands of extensive academic reading and writing, classroom participation, social interaction, written and spoken communications with university faculty and staff, SEVIS regulations, and multiple other demands of the U.S. college environment. (Although the IELTS includes interaction with the examiner and another student, these interactions are highly structured and not very natural. TOEFL writing and speaking tasks are limited, artificial, and assessed by a grader who has only a text or sound or text file to work with.)

Contrast that with regular, direct observation of students’ language proficiency by a trained and experienced instructor, over a period of time. The instructor can set up a variety of language situations involving variation in interlocutors, contexts, vocabulary, levels of formality, and communication goals. In an ACCET or CEA accredited intensive English program, such tasks are linked to documented learning objectives. By directly observing students’ performance, instructors are able to obtain a rich picture of each student’s proficiency, and are able to comment specifically on each student’s strengths and weaknesses.

Consider this a call, then, for colleges and universities to enter into agreements with accredited intensive English programs to waive the need for a standardized test such as the TOEFL. Just as those colleges and universities don’t use a standardized test to measure the learning of their graduates, they should be open to accepting the good judgment of teachers in intensive English programs – judgment based on direct observation of individual learners rather than the proxy scores obtained by impersonal, artificial tests.

Goodhart’s Law and the Measurement of English Proficiency

Goodhart’s Law was first proposed by the British economist Charles Goodhart. In essence it states that, “When a measure becomes the goal, it ceases to be a good measure.” Measurements are often used as a proxy for performance. For example, it’s sometimes reported that in Soviet Russia, when the success of nail production was measured by quantity of nails, many tiny nails were produced. When it was measured using weight of nails, smaller numbers of large nails were produced. The measure became the target, and gaming the system created the illusion of success.

In U.S. university admissions, the TOEFL is the most common measure of the English proficiency of international applicants. It’s easy to understand why the complexity of language proficiency needs to be reduced to a small set of numbers when large quantities of applications have to be evaluated. Unfortunately, TOEFL preparation is very often a great example of Goodhart’s Law in action: many students focus on attaining the necessary score rather than comprehensively working on the cognitive-academic language skills and cultural skills they need to succeed in the U.S. university, and this can result in serious challenges for those students. Once matriculated, as those students seek to earn good grades – a proxy measure for learning – they may wind up trying to game the system by plagiarizing, using online essay services, cramming at the last minute, or begging the instructor for a better grade.

Although it would present practical difficulties, it would serve everyone better – schools and students – if the schools used a broader set of mechanisms to determine English proficiency. These might include evidence of English (not just test prep) study, Skype, phone, and in-person interviews, recorded presentations by applicants, synchronous online discussion groups, and reports from instructors in intensive English programs who have first-hand – not proxy – knowledge of the students’ English.