1
Report on Test Design
1) Introduction
Test design is a complex process that requires the test designers to consider various factors.
The purpose of the test, students that will undertake the test, the macro skills tested, test
questions, text types, validity and reliability are just some of the factors that need to be
considered in order to create the test that is applicable to the appropriate candidates. The
purpose of this report is to explain, in detail, the test design of an English proficiency test that
aims to assess international students (at Western Sydney University The College) on their
English proficiency level in order for them to study at Western Sydney University. The English
proficiency test will have the general test taking instructions and four sections testing on the
four macro-skills: listening, reading, writing and speaking. In this report, the design of the test
questions, the scoring system, the choice of texts and future implications will be discussed.
2) Designing an English proficiency test
2.1) Background to the English proficiency test designed
The purpose of an English proficiency test is to assess students on their English language ability
by testing their listening, reading, writing and speaking. Whether the students have undertaken
previous English language classes or not is not the focus of the test. Rather, the test provides
an opportunity for the students to demonstrate their knowledge and ability of their English
language skills. Therefore, care must be taken so that the test is designed in such a way that
allows every student an equal chance to complete the test.
2.2 Testing Listening
The first section of the test will be the listening section. By testing listening by itself, the test
is assessing students on the sub-skills that are specific to listening. The sub skills can be
2
considered as operations, i.e. what it is that the students is expected to do as part of the listening
skill (Hughes, 2003). The operations can be divided into: informational and interactional.
Informational covers the operation that is to do with obtaining and understanding different
types of information whereas interactional looks at using that information and applying it to
certain context.
This test will contain the following informational operations: obtaining factual information,
understanding how requests are made, recognising and understanding excuses, recognising and
understanding comments and being able to follow the sequence of events. For interactional
operations: understand expressions of agreement and disagreement, recognise and understand
requests for clarification, recognise indications of understanding and recognise attempts to
persuade. All of these form part of the cognitive and metacognitive skills (Thompson & Rubin,
1996).
The listening section will have one text: a monologue. The monologue will report on Spain’s
reaction to Catalonia’s independence movement (Lynn, 2017). Students will be given fortyfive minutes to complete and must answer the ten questions that follow the text.
The test aims to have a moderately fast speech rate. The reason being that a slow speech rate
does not guarantee higher scores than a moderately fast speech rate. Griffiths (1990) found that
there was no major differences in students’ listening performance between moderately fast and
slow speech rate.
To help students store and recall information from the text, five minutes of information transfer
will be given to help students take notes while listening to the two tasks. For this test, taking
notes is an optional task as it will reflect the real-world situation of note taking in the academic
setting. However, the effect of note-taking skills will have little impact on students’
performance as the design of the listening section places little emphasis on memory and does
3
not take into account students’ previous note-taking learning or habits (Hale & Courtney,
1991).
After discussing the details of the text, the next step is to explain how the above operations are
tested. Some of the common techniques of testing those operations are: multiple choice and
short answer. The advantage of multiple-choice questions is that it is a quick method of
assessing whether students have chosen the correct answer. For scorers, it is easier for them to
mark, as the answer is objective in nature where there is only one right answer. For students,
they just need to follow the questions and choose the most appropriate or correct answer. This
means that students know exactly what is required to complete the questions (Bailey, 1998;
Brown & Hudson, 2002).
The downside with multiple choice is that it puts additional pressure on the students’ cognitive
ability. While the students are listening to the texts, they also need to remember the four
alternatives of the multiple choice and after deciding one alternative for one question; they
need to remember another set of four alternatives in the next question. This increases the
pressure on students’ cognitive burden as the short-term memory can only hold information for
a short time and within this time, students need to remember new information (Mahdavi &
Bahrpeyma, 2016).
With this consideration for students in mind, questions 1 and 2 are multiple-choice questions
where there are four alternatives with three of them being distractors (Appendix 2). This is to
minimise the likelihood that weaker performing students are using test-preparation techniques
to take a lucky guess at multiple-choice questions with fewer alternatives (Aryadoust, 2012).
Care has been taken so that questions and alternatives are short and simple for the students to
understand. Without proper care, questions and alternatives can force students to memorize
unimportant details, which will increase their burden on mental load (Shohamy & Inbar, 1991).
4
Another test technique is short answer. Similar to multiple choice, the questions are short and
easy to understand with clear instructions and two lines for answers (Appendix 2, questions 3-
5). Unlike multiple-choice questions, short answer questions draws on students’ ability to
understand the content of the listening text and answer the questions by using their own words
(Hsiao-fang, 2004).
Gap filling is a more advanced technique in that the answer requires students to understand
more than just informational operations. It encourages students to understand the interactional
operations where students need to pay attention to the whole text and process the text as to find
the answer. This reflects the need to not only recognize an expression or speech act but also
understand the context surrounding those expressions (Vandergrift, 1999). This can be seen
through questions 6-10 where the instructions are clear and requires students to look for
answers rather than just giving a unique response (Appendix 2).
2.3 Testing Reading
Testing reading is not an easy task. When designing the reading section, the test should elicit
the students’ reading skills and assess whether the students have successfully demonstrated the
competent use of the reading skills.
Mentioning about reading skills, the different ways to read are used to assess reading skills in
English language proficiency tests. Depending on the text and purpose of reading, reading a
text slowly is to gain a comprehensive understanding of the text by looking through the
vocabulary used, the grammar structure, the main idea and make inferences from the text (Weir
& Khalifa, 2008; Khalifa & Weir 2009). Reading a text quickly is used to obtain the main idea
of the text or to look for specific sections of the text to find the piece of information that is
required (Urquhart & Weir, 1998). These different style of reading and purposes are reflected
by the reading operations that are in this test, namely: understanding the author’s main idea,
5
skimming, scanning for specific information, identifying words and expressions in their
context and identifying the author’s attitudes, viewpoints or implications (Behfrouz & Nahvi,
2013).
The reading text is a newspaper article where the form will be a narration. In terms of topics,
the newspaper article will not be a technical or specialized in nature. The style of the article
will combine formal and informal registers. In terms of length, the text will be around 345
words.
Keeping in mind the above test specifications, the test will use different test techniques to
assess students’ reading ability. One of the test techniques is multiple choice (Appendix 3,
question 13). Multiple choice is easy to design but it is not the best test technique. Multiple
choice seems to test students’ superficial knowledge and even if students do not have complete
knowledge, partial knowledge and randomly successful guessing are some methods used to
compensate for the lack of complete knowledge. The alternatives in multiple-choice questions
may contain same suffixes or word as that of the target word, which encourages students to use
partial knowledge to make successful guesses (Zhang, 2013). Clearly, this is undesirable as
this can inflate students’ scores on reading section leading to a more inaccurate measure of
students’ reading ability.
Short answer, on the other hand, is an excellent way of testing the five above-mentioned
reading operations. For a start, short answer questions can assess students’ ability on
expeditious reading. For instance, the question may ask students to skim through the text to
determine the structure of the text (Appendix 3, questions 19 &20). By skimming through the
text, stronger performing students should be able to identify the flow of the ideas of the text,
recognize the cohesive devices and understand the reasons behind the author’s sequence of
ideas (Alderson, Percsich & Szabo, 2000).
6
For scanning, short answer questions will require students to read the text for a specific piece
of information. An example would be questions 12 where it is asking students to supply only
one word or phrase (Appendix 3).
Question 11 of this reading section asks students to understand words or expressions in its
context where the aim of the question is to elicit students’ ability to understand pronominal
referents. The word ‘they’ in the question is a pronominal referent where it requires students
to understand what exactly the word ‘they’ refers to (Appendix 3).
Understanding the text writer’s viewpoints, implications or attitudes is part of the careful
reading operation and students are required to do an additional task of processing the
information from the text and infer the situation or rationale of the text. The aim of this
operation is to assess students’ understanding of the context of the text and is used to
differentiate the stronger performing students from the weaker performing students. An
example would be questions 14-16 where there are three sentences. Students must put the word
‘fact’ or the word ‘opinion’ next to each of the sentences. Also, students must get all three
sentences correct in order to get the full marks for that question (Appendix 3).
Gap filling is another technique where it can elicit students’ knowledge of the five abovementioned operations. For instance, questions 17 and 18 asks students to fill in the gaps of a
summarised section of the text (Appendix 3). This is a great way to test whether students have
understood the main idea of the text and is a way of differentiating weaker performing students
and stronger performing students as, according to Yamashita (2003), stronger performing
students used their understanding of the text more often than weaker performing students did.
2.4 Testing Writing
7
The third section of this test is the writing section where writing requires a different testing
technique than listening and reading. Multiple-choice questions, short answer questions and
gap-filling questions are some of the techniques used to elicit reading and listening skills.
However, when it comes to testing writing, these techniques are not suitable for testing writing.
Multiple-choice questions is a poor reflection of writing skills, short answer questions only
require students to write a couple of words or a sentence for the answers while gap-filling
questions are limited in that the answers are highly dependent on the immediate surroundings
of the sentence (Alderson, 1979).
In light of the restrictions of the above techniques, the best technique to test writing is to ask
students to write and there are three main factors that test designers need to bear in mind. One
is that the tasks set in the test should be a representative sample of the possible tasks that
students are expected to write. Two is that the writing tasks should be able to draw out students’
ability to write. Three is that students’ writing samples are scored in a reliable and valid manner
(Hughes, 2003). This test will cover these three factors.
Starting with the representative sample, the first thing to do is to write a test specifications list
and it will have the following features:
a) Operations: describe; argue for and against a position
b) Types of text: the first task will be to describe a graph where students will need to write 150
words and the second task will be a discussion essay where students will need to write 250
words
c) Addressees of texts: native speaker and non-native speaker university lecturers
d) Topics: Any topic that is deemed academically worthy. No specialist knowledge required.
Relevant to the students.
8
e) Dialect and Style: Any standard variety of English. Formal register.
After listing out the test specifications, the next step is to elicit a valid sample of writing ability.
For this test, there will be two separate tasks with different formats to help students be familiar
with different task types.
Ensuring validity and reliability in scoring is important, as it will accurately reflect students’
true writing abilities. Also, the scores itself will help The College decide whether students are
capable enough to undertake study at Western Sydney University.
In terms of creating appropriate scales for scoring, the test will use the analytic scoring rather
than holistic scoring. Analytic scoring will need more time to assess students’ writing ability.
However, it will give a more detail analysis of that writing ability by looking at the individual
features of good writing (Çetin, 2011). For this test, the scoring will be a marking rubric with
four criteria and a score of 1-5 for each criterion. The four criterion are as follows: grammar,
vocabulary, fluency and form. It is important to allocate different scores for different criteria
even though it is not always an easy task. Ruegg, Fritz & Holland (2011) have looked at the
difficulties that scorers had when marking the essay section of Kanda English Proficiency Test
(KEPT). The scorers used an analytic scoring system for that section where there was a
tendency to award the same scores for two of the criteria: grammar and vocabulary. This
reflected the fact the some errors are hard to assess. For instance, the improper use of a
preposition in an idiomatic phrase. Scorers were unsure as to consider the improper use of a
preposition as an error as part of the phrase or simply an error by itself.
Each score rank is more than just a number as the different ranks can indicate that students
at a higher score can write with a much lower number of errors, have a better grasp of
performing the criterion and are more capable of writing complex sentences (Müller, 2015).
9
Also, 1 is the lowest score while 5 is the highest score for each criterion. Each task will have a
marking rubric and the total score for the writing section is 40 (20 marks for each task).
2.5 Testing Speaking
The fourth and final section of this test is the speaking section. Like writing, speaking requires
a different test technique than that of listening and reading. With speaking, however, using
written texts is inappropriate. One is that the written texts is more suitable for testing students’
comprehension ability than production ability. Two is the issue of authenticity. Students can
read a transcript of spoken text but this does not guarantee that students are capable of
producing the spoken text verbally.
To truly test students’ speaking ability, it is reasonable to ask the students to speak and this test
will assess the students’ speaking skill by requiring students to complete two speaking tasks.
The test specifications for this section is as follows:
a) Informational Operations: Students should be able to provide personal information; describe
sequence of events; give explanations; present an argument; provide required information;
draw conclusions, indicate attitude and express preferences.
b) Interactional Operations: Students should be able to express purpose; express agreement;
express disagreement; elicit opinions, modify statements; attempt to persuade others; respond
to requests for clarifications and indicate understanding.
c) Managing interaction skills: Students should be able to initiate interactions, sustain
viewpoints, indicate turn taking, come to a decision and end interactions.
d) Types of text: Interview and discussion
e) Addressees: i) Two scorers
10
f) Topics: First task will be on college life at The College while the second task will be
discussing whether Western Sydney University should implement a tri-semester academic
year.
g) Dialect: Standard Australian English
h) Style: Both informal and formal
By exposing students to an interview and a discussion, students will gain an understanding on
the different features of the tasks and will help students gain the confidence to show their
speaking ability (Issitt, 2008).
In terms of marking, this test will use an analytic scoring system in order to have a
comprehensive assessment of students’ speaking ability. For each tasks, there will be a marking
scale with four criteria. Each criteria will be scored from 1-5, with 5 being the highest. The
four criteria will be: grammar, vocabulary fluency and form. Like the writing marking scale,
different scores will be given to different criteria as the test aims to assess students’ knowledge
and ability of speaking skill and sub-skills.
3) Conclusion
Designing a proficiency test is a long and difficult task with a long list of factors to consider.
Not just the structure of the test itself but also test techniques, range of texts used, operations,
scoring system, validity and reliability. Although test modifications had to be made to suit the
purpose of this test, however, it is with hope that this test will shed light on how to design a
test for a specific purpose and assess students’ true English language ability.
11
4) References
Alderson, J., C. (1979). The cloze procedure and proficiency in English as a foreign
language. TESOL Quarterly, 13(2), 219-227.
Alderson, J., C., Percsich, R., & Szabo, G. (2000). Sequencing as an item type. Language
Testing, 17(4), 423-447. doi: 10.1177/026553220001700403.
Aryadoust, V. (2012). Differential item functioning in while-listening performance tests: The
case of the International English Language Testing System (IELTS) listening module.
International Journal of Listening, 26(1), 40-60. doi:
10.1080/10904018.2012.639649.
Australian Bureau of Statistics. (2017, February). Student (FTE) to teaching staff (FTE)
ratios 2001-2016 (no. 4221.0). Retrieved from
http://www.abs.gov.au/AUSSTATS/[email protected]/Latestproducts/4221.0Main%20Feature
s22016?opendocument&tabname=Summary&prodno=4221.0&issue=2016&num=&v
iew=
Bailey, K., M. (1998). Learning about language assessment: Dilemmas, decisions, and
directions. New York: Heinle & Heinle.
Behfrouz, B., & Nahvi, E. (2013). The effect of task characteristics on IELTS reading
performance. Open Journal of Modern Linguistics, 3(1), 30-39.
doi:10.4236/ojml.2013.31004.
Brown, J., D. & Hudson, T. (2002). Criterion-referenced language testing. Cambridge, UK:
Cambridge University Press.
12
Çetin, Y. (2011). Reliability of raters for writing assessment: Analytic-holistic, analyticanalytic, holistic-holistic. Mustafa Kemal University Journal of Social Sciences
Institute, 8(16), 471-486.
Griffiths, R. (1990). Speech rate and NNS comprehension: A preliminary study in timebenefit analysis. Language Learning, 40(3), 311-336.
Hale, G., A., & Courtney, R. (1991). Note taking and listening comprehension on the Test of
English as a Foreign Language. ETS Research Report Series, 34(1), i-26.
Hsiao-fang, C. (2004). A comparison of multiple-choice and open-ended response formats for
the assessment of listening proficiency in English. Foreign Language Annals, 37(4),
544-553.
Hughes, A. (2003). Testing for language teachers (2nd ed.). Cambridge: Cambridge
University Press.
Issitt, S. (2008). Improving scores on the IELTS speaking test. ELT Journal, 62(2), 131-
138.doi:10.1093/elt/ccl055.
Khalifa, H., & Weir, C., J. (2009). Examining reading: Research and practice in assessing
second language reading, Cambridge: UCLES/Cambridge University Press.
Lynn, B. (Reporter). (2017, October 11). Spain rejects mediation to resolve Catalonia crisis
[Audio Podcast]. VOA Learning English. Retrieved from
https://learningenglish.voanews.com/a/spain-rejects-mediation-talks-to-resolvecatalonia-crisis/4066150.html/
13
Mahdavi, Z., A. & Bahrpeyma, M. (2016). The relationship between short-term memory and
listening comprehension ability of IELTS test takers at different language proficiency
levels. International Journal of Research Studies in Language Learning, 6(3), 35-45.
Minear, T., & Harris, R. (2017, October 12). Education Minister Simon Birmingham
announces foreign students have to scrub up English skills to study in Australia.
Herald Sun. Retrieved from http://www.heraldsun.com.au/news/specialfeatures/news-in-education/education-minister-simon-birmingham-announcesforeign-students-have-to-scrub-up-english-skills-to-study-in-australia/newsstory/64d0f2ca0fd6852c39aab1ce1c2e4399
Müller, A. (2015). The differences in error rate and type between IELTS writing bands and
their impact on academic workload, Higher Education Research & Development,
34(6), 1207-1219. doi: 10.1080/07294360.2015.1024627.
Ruegg, R., Fritz, E., & Holland, J. (2011). Rater sensitivity to qualities of lexis in writing.
TESOL Quarterly, 45(1), 63-80.
Shohamy, E., & Inbar, O. (1991). Construct validation of listening comprehension tests: The
effect of text and question type. Language Testing, 8, 23–40.
Thompson, I., & Rubin, J. (1996). Can strategy instruction improve listening comprehension?
Foreign Language Annals, 29(3), 331-342.
Urquhart, A., S. & Weir, C., J. (1998). Reading in a second language: Process, product and
practice. Essex: Pearson Education Ltd.
Vandergrift, L. (1999). Facilitating second language listening comprehension: Acquiring
successful strategies. ELT Journal, 53(3), 168-176.
14
Weir, C., J., & Khalifa, H. (2008). A cognitive processing approach towards defining reading
comprehension, Cambridge ESOL: Research Notes, 31, 2-10.
Yamashita, J. (2003). Processes of taking a gap-filling test: Comparison of skilled and less
skilled EFL readers. Language Testing, 20(3), 267-293.
Zhang, X. (2013). The “I don’ know” option in the vocabulary test size. TESOL Quarterly,
47(4), 790-811.
Need help with your own assignment?
Our expert writers can help you apply everything you've just read — to your actual assignment.
Get Expert Help Now →