Main navigation

Studies in language testing (silt).

Studies in Language Testing (SiLT) is a series of academic volumes edited by Professor Lynda Taylor and Dr Nick Saville. It is published jointly by Cambridge English and Cambridge University Press (CUP).

The series addresses a wide range of important issues and new developments in language testing and assessment, and is an indispensable resource for test users, developers and researchers. There are currently over 50 titles available; a full list of these, plus content summaries, is provided below. For a reader's overview of the series, including a thematic categorisation and extracts from reviews, please see this essay, kindly contributed to Cambridge English by a visiting professor, Xiangdong Gu: Download Studies in Language Testing Essay by Xiangdong Gu (PDF)

Copies of the volumes are available from booksellers or can be ordered direct from the Cambridge University Press website.

SiLT volumes

Volume 54 - on topic validity in speaking tests (khabbazbashi 2021).

Front cover of Volume 54 - On Topic Validity in Speaking Tests (Khabbazbashi 2021)

On Topic Validity in Speaking Tests Nahal Khabbazbashi

Topics are often used as a key speech elicitation method in performance-based assessments of spoken language, and yet the validity and fairness issues surrounding topics are surprisingly under-researched. Are different topics ‘equivalent’ or ‘parallel’? Can some topics bias against or favour individuals or groups of individuals? Does background knowledge of topics have an impact on performance? Might the content of test taker speech affect their scores – and perhaps more importantly, should it? Grounded in the real-world assessment context of IELTS, this volume draws on original data as well as insights from empirical and theoretical research to address these questions against the backdrop of one of the world’s most high-stakes language tests.

This volume provides:

  • an up-to-date review of theoretical and empirical literature related to topic and background knowledge effects on second language performance
  • an accessible and systematic description of a mixed methods research study with explanations of design, analysis, and interpretation considerations at every stage
  • a comprehensive and coherent approach for building a validity argument in a given assessment context.

The volume also contributes to critiques of recent models of communicative competence with an over-reliance on linguistic features at the expense of more complex aspects of communication, by arguing for an expansion of current definitions of the speaking construct emphasising the role of content of speech as an important – yet often neglected – feature.

This volume will be a valuable resource for postgraduate students, those working professionally in the field of speaking assessment such as personnel in examination boards, item writers and curriculum developers, and anyone seeking to better understand and improve the fairness and validity of topics used in assessments.

Download a free full pdf version of this volume

Volume 53 - Insights into Assessing Academic Listening: The Case of IELTS

Front cover of Volume 53 - Assessing Academic Listening: The Case of IELTS

Insights into Assessing Academic Listening: The Case of IELTS

Opening with an overview of studies that investigate the listening test component of the International English Language Testing System (IELTS), this volume proposes and illustrates a new line of enquiry for academic listening assessment: a better understanding of the cognitive processes underlying everyday listening events can provide a framework for recognising what is distinctive about the skill when applied to an English for Academic Purposes (EAP) or professional context. The outcome is a set of validation criteria against which a reviewer can measure to what degree a given test does or does not represent the academic or professional experience, which can be applied across various features of a listening test and to the design of all similar tests in this field.

  • an up to date review of relevant literature on assessing academic listening
  • a clear and detailed specification of the construct of academic listening, with an evaluation of how this is used for assessment purposes
  • a consideration of the nature of academic listening in a digital age, and its implications for assessment research and test development

As test developers need to support score validity claims with a sound theoretical framework which guides their coverage of appropriate language ability constructs, this volume will be a rich resource for examination boards and other institutions, as well as researchers and graduate students in the field of language assessment, and teachers preparing students for IELTS or involved in EAP programmes.

Volume 51 - Research and Practice in Assessing Academic Reading: The Case of IELTS (Weir and Chan 2019)

Front cover of Volume 51 - Research and Practice in Assessing Academic Reading: The Case of IELTS (Weir and Chan 2019)

Research and Practice in Assessing Academic Reading: The Case of IELTS Cyril J Weir and Sathena Chan

This volume describes differing approaches to understanding academic reading ability that have emerged in recent decades and goes on to develop an empirically grounded framework for validating tests of this skill. The framework is then applied to the IELTS Academic Reading module to investigate a number of different validity perspectives that reflect the socio-cognitive nature of any assessment event. The authors demonstrate how a systematic understanding and application of the framework and its components can help test developers to operationalise their tests so as to fulfill the validity requirements for an academic reading test.

The book provides:

  • an up to date review of the relevant literature on assessing academic reading
  • a clear and detailed specification of the construct of academic reading and evaluation of how this is used for assessment purposes
  • a consideration of the nature of academic reading in a digital age and its implications for assessment research and test development.

The volume is a rich source of information on all aspects of testing academic reading ability. Examination boards and other institutions who need to validate their own academic reading tests in a systematic and coherent manner, or who wish to develop new instruments for measuring academic reading, will find it a useful resource, as will researchers and graduate students in the field of language assessment, and teachers preparing students for IELTS (and similar tests) or involved in English for Academic Purposes (EAP) programmes.

Volume 50 - Lessons and Legacy: A Tribute to Professor Cyril J Weir (1950-2018) (Edited by Taylor and Saville 2020)

Front cover of Studies in Language Testing – Volume 49

Lessons and Legacy: A Tribute to Professor Cyril J Weir (1950-2018) Edited by Lynda Taylor and Nick Saville

Written by a selection of his friends and collaborators, this volume pays tribute to the academic achievements of the late Professor Cyril J Weir. His passing in September 2018 leaves an eclectic legacy in the field of language testing and assessment, and the chapters contained herein, part of a series he guided and often wrote for, honour and illuminate his lessons.

Professor Weir’s chronicling of the history and evolution of language testing is reflected in chapters on his role in assessment reform and the origins of his socio-cognitive framework; authors also reflect on the impact of this model on test validity and validation. He was also a vital influence in putting these ideas into action, as reported in chapters on test operationalisation and the establishment of the Centre for Research in English Language Learning and Assessment (CRELLA).

By drawing on a rich range of voices in language assessment, from China to the UK to the Middle East, and from Professor Weir’s earliest to most recent collaborators, we illustrate breadth and depth of his impact on language testing and assessment, and how his lessons continue to be relevant to the present day.

Volume 49 — Applying the Socio-cognitive Framework to the BioMedical Admissions Test: Insights from language assessment (Cheung, McElwee and Emery 2017)

Front cover of Studies in Language Testing – Volume 49

Applying the Socio-cognitive Framework to the BioMedical Admissions Test: Insights from language assessment Edited by Kevin Y F Cheung, Sarah McElwee and Joanne Emery

This volume takes a framework for validating tests that was developed in language testing, and applies it to an admissions test used for biomedical courses. The framework is used to consider validity in the BioMedical Admissions Test (BMAT). Each chapter focuses on a different aspect of validity and also presents research that has been conducted with the test. By addressing all of the validity aspects identified as important by language testers, this volume presents a comprehensive evaluation of BMAT's validity. The processes of evaluation used in the book also promote a cross-disciplinary approach to assessment research, by demonstrating how effectively language testing frameworks can be used in different educational contexts. The authors of the chapters include Cambridge Assessment staff and medical education experts, from a wide range of subject backgrounds. Psychologists, clinicians, linguists and assessment experts have all contributed to the volume, making it an example of multidisciplinary collaboration.

The Cambridge approach to admissions testing (Nick Saville) Considering the test taker in test development and research (Devine, Taylor and Cross) Cognitive validity (Cheung and McElwee) Building fairness and appropriacy into testing contexts Tasks and administrations (Shannon, Crump and Wilson) Scoring validity (Elliott and Gallacher) Criterion-related validity (Fyfe, Devine and Emery) Consequential validity (McElwee, Fyfe and Grant) Conclusions and Recommendations (Cheung)

Volume 48 — Second Language Assessment and Action Research (Burns and Khalifa 2017)

Front cover of Studies in Language Testing – Volume 48

Second Language Assessment and Action Research Edited by Anne Burns and Hanan Khalifa

Volume 47 — examining young learners: research and practice in assessing the english of school-age learners (papp and rixon 2018).

RV - SiLT issue 47 - image

Examining Young Learners: Research and practice in assessing the English of school-age learners Szilvia Papp and Shelagh Rixon (2018)

The unique areas of children and teenagers’ second language development and assessment is given a state-of-the-art account in this volume. Common issues in cognitive psychology, child second language (L2) acquisition studies, recent research on adolescents, and language assessment are explored by linking research carried out within the educational, academic and testing communities.

The volume reflects on how learners’ L2 development between the ages of 6 and 16 can be coherently described and their L2 assessment defined in terms of socio-cognitive validity. There is particular focus on the theoretical foundations, language competence model, development and validation framework, and evaluation and review processes to provide evidence for the validity of the Cambridge English family of assessments for children and teenagers.

Academics, assessment professionals and postgraduate researchers of L2 development in children and teenagers will find great value in the volume’s theoretical insight, while policy-makers and teachers will gain rigorous practical advice for the young language learner’s classroom and assessment.

Order a copy

Volume 46 — Advancing the Field of Language Assessment: Papers from TIRF doctoral dissertation grantees (Christison and Saville 2016)

Front cover of Studies in Language Testing – Volume 46

Advancing the Field of Language Assessment: Papers from TIRF doctoral dissertation grantees Edited by MaryAnn Christison and Nick Saville (2016)

Since 2002, the International Research Foundation for English Language Education (TIRF) has supported students in completing their doctoral research on topics related to the foundation’s priorities. Each year applicants who have been advanced to candidacy in legitimate PhD or EdD programmes are invited to submit proposals for Doctoral Dissertation Grants (DDGs).

This volume brings together a set of 11 TIRF-related research papers on English language assessment. As a member of the TIRF Board of Trustees, Cambridge English wishes to support the foundation in achieving its aims in disseminating and influencing language testing policies. 

  • focuses on the applied nature of research in language assessment
  • discusses the implications of such research, and
  • presents its findings from a global perspective.

This volume can serve as a core or supplemental text for graduate seminars in English language assessment in applied linguistics, education, TESOL, and TEFL, and it is useful for scholars of L2 methodology, curriculum design, and teacher development in ELT, as well as for courses on language assessment. As a reference volume, it is appropriate for individual scholars, test developers, graduate and undergraduate students, and researchers.

Volume 45 — Learning Oriented Assessment: A systemic approach (Jones and Saville 2016)

Front cover of Studies in Language Testing – Volume 45

Learning Oriented Assessment: A systemic approach Neil Jones and Nick Saville (2016)

The learning-oriented approach to assessment developed in this book seeks to exploit the commonality as well as the complementarity of formal assessment and classroom assessment. It proposes a Learning Oriented Assessment (LOA) model which presents a systemic, ecological approach in which all kinds of assessment contribute positively to their two major educational purposes: promoting better learning and measurement and contributing to a meaningful interpretation of learning outcomes.

The volume poses three key questions central to LOA: ‘What is learning?’, ‘What is to be learned?’, and ‘What is to be assessed?’, and discusses how a focus on these fundamental aspects of learning and assessment can support learners, teachers and assessment professionals. The volume also focuses on the use of evidence and on how it can be collected and used to feed back into learning. It overviews large-scale assessment as practised by Cambridge English and learning-oriented classroom assessment practices, which is where learning interactions take place. The volume concludes with a look at implementing LOA in practice.

This volume is a rich source of information on key issues, principles and practices in the area of LOA. It provides fresh insights into current knowledge and understanding of the role of assessment in supporting learning, as well as useful guidance on good practice. As such, it will be of considerable interest to assessment practitioners, teachers and academics, educational policy-makers and examination board personnel.

Volume 44 — Language Assessment for Multilingualism: Proceedings of the ALTE Paris Conference, April 2014 (Docherty and Barker 2016)

Front cover of Studies in Language Testing – Volume 44

Language Assessment for Multilingualism: Proceedings of the ALTE Paris Conference, April 2014 Edited by Coreen Docherty and Fiona Barker (2016)

This volume explores the role of multilingualism in social, educational and practical contexts. It brings together a collection of edited papers based on presentations given at the 5th International Conference of the Association of Language Testers in Europe (ALTE) held in Paris, France, in April 2014.;

The selected papers focus on several core strands addressed during the conference. 

Section 1 deals with frameworks in social contexts and focuses on their role in migration and multilingual policy and practice. It addresses how recent education reforms aim to increase both social mobility and intercultural communication. Section 2 focuses on the response of language assessment providers to the rise of linguistic diversity. Section 3 then discusses the role of intercultural professionalisation of language assessors. Finally, Section 4 reflects on the approach of various institutes to achieve fairness and quality in test provision.

Key features of the volume include:

  • insights on the effect of multilingualism on international mobility
  • discussion of how multilingualism can address the challenge of increasing linguistic diversity
  • reflection on the impact of intercultural communication on linguistic competence
  • advice on how to ensure fairness and quality in language assessment.

With its broad coverage of key issues and combination of theoretical insights and practical advice, this volume is a valuable reference work for academics, employers and policy-makers in Europe and beyond. It is also a useful resource for postgraduate students of language testing and for practitioners, and anyone else seeking to understand the policies, procedures and challenges encountered in the application of multilingualism.

Volume 43 — Second Language Assessment and Mixed Methods Research (Moeller, Creswell and Saville 2016)

Front cover of Studies in Language Testing – Volume 43

Second Language Assessment and Mixed Methods Research Edited by Aleidine J Moeller, John W Creswell and Nick Saville (2016)

Test developers have a responsibly to ensure that the assessments they develop meet the needs of test users and provide a fair assessment in educational and social contexts. Mixed methods research plays an important role in providing a set of different but complementary research tools which can be used to underpin the assessment validation process and add value to assessment research.

The purpose of this volume is to create a deeper understanding of the role of mixed methods in language assessment, and to provide essential information needed to conduct and publish mixed methods research within the context of language assessment. Mixed methods language assessment studies on topics such as community-based participatory test development, investigating test impact, developing new test tasks and rating scales, illustrate first-hand the benefits and added value of mixed methods to the language testing and assessment field. 

  • theoretical insights and practical guidance on the use of mixed methods research
  • advice on the essential components for conducting and publishing mixed methods research
  • case studies from language assessment to demonstrate how mixed methods research can be rigorously and systematically applied in a specific context. 

This is the first volume of its kind to comprehensively illustrate the application of the principles of mixed methods research in language assessment and to combine theoretical insights and practical illustrations of good practice. As such, it is a valuable reference work for academics, postgraduate students and practitioners, and anyone else seeking to understand the purpose, design and application of mixed methods research.

Volume 42 — Assessing Language Teachers’ Professional Skills and Knowledge (Wilson and Poulter 2015)

Front cover of Studies in Language Testing – Volume 42

Assessing Language Teachers’ Professional Skills and Knowledge Edited by Rosemary Wilson and Monica Poulter (2015)

The growth in English language teaching worldwide, and the related increase in teacher training programmes, have made it more important than ever for greater accountability in the assessment of teachers. Formal, summative assessment has taken on greater importance in many teacer training programmes and requires procedures which do not always sit easily with the development process. Meanwhile, transparency of assessment procedures is also increasingly demanded by the candidates themselves.

This edited volume discusses key issues in assessing language teachers’ professional skills and knowledge, and provides case study illustrations of how teacher knowledge and teaching skills are assessed at pre-service and in-service levels within the framework of the Cambridge English Teaching Qualifications.

The volume provides:

  • discussion of ways in which the changing nature of English language teaching has impacted on teacher education and assessment
  • useful illustrations of specific assessment procedures for both teaching knowledge and practical classroom skills
  • real-life examples of the ways in which the Cambridge English Teaching Qualifications have been integrated into and adapted to work in local contexts.

This is the first volume of its kind wholly dedicated to language teacher assessment. As such, it will be of interest not only to researchers and postgraduate students but also language teachers and teacher educators.

Volume 41 — Validating Second Language Reading Examinations (Wu 2014)

Front cover of Studies in Language Testing – Volume 41

Validating Second Language Reading Examinations Establishing the validity of the GEPT through alignment with the Common European Framework of Reference Rachel Wi-fen Wu (2014)

Validating Second Language Reading Examinations describes the development of an empirical framework for test validation and comparison of reading tests at different proficiency levels through a critical evaluation of alignment with the Common European Framework of Reference (CEFR). It focuses on contextual parameters, cognitive processing operations and test results, and identifies parameters for the description of different levels of reading proficiency examinations. The volume explores procedures for linking tests to the CEFR and proposes both qualitative and quantitative methods that complement the procedures recommended in the Council of Europe’s Relating Language Examinations to the Common European Framework of Reference for Languages (CEFR): A Manual , piloted in 2003 and revised in 2009.

  • a detailed review of the literature on CEFR alignment, vertical scaling, test specifications and test comparability
  • a comprehensive and coherent approach to the validation of reading tests
  • an accessible and systematic description of procedures for collecting validity evidence
  • a case study comparing different testing systems targeting the same CEFR level.

This volume will be a valuable resource for academic researchers and postgraduate students interested in using CEFR alignment procedures and methodology to demonstrate differentiation across different levels of a testing system and equivalence between different examinations that target a particular CEFR level. It will be of particular relevance to exam boards who wish to validate their reading tests in terms of differentiation across test levels and external criteria. It will also be a useful reference for teachers and curriculum designers who wish to reflect real-life reading activities when they prepare reading tasks for language learning.

Volume 40 — Multilingual Frameworks: The construction and use of multilingual proficiency frameworks (Jones 2014)

Front cover of Studies in Language Testing – Volume 40

Multilingual Frameworks: The construction and use of multilingual proficiency frameworks Neil Jones (2014)

This volume describes 20 years of work at Cambridge English to develop multilingual assessment frameworks and presents useful guidance of good practice. It covers the development of the ALTE Framework and ‘Can Do’ project, the Common European Framework of Reference (CEFR) and the linking of the Cambridge English exam levels to it, Asset Languages – a major educational initiative for UK schools, and the European Survey on Language Competences, co-ordinated by Cambridge English for the European Commission. It proposes a model for the validity of assessment within a multilingual framework and, while illustrating the constraints which determined the approach taken to each project, makes clear recommendations on methodological good practice. It also explores and looks forward to the further extension of assessment frameworks to encompass a model for multilingual education.

Key features of the volume include: • a clear and comprehensive explanation of several major multilingual projects • combination of theoretical insights and practical advice • discussion of the interpretation and use of the CEFR.

Multilingual Frameworks is a rich source of information on key issues in the development and use of multilingual proficiency frameworks. As such, it will be a valuable reference work for academics, education policy-makers and examination board personnel. It is also a useful resource for postgraduate students of language assessment and for practitioners, and any stakeholders seeking to gain a clearer picture of the issues involved with cross-language assessment frameworks.

Volume 39 — Testing Reading through Summary: Investigating summary completion tasks for assessing reading comprehension ability (Taylor 2013)

Front cover of Studies in Language Testing – Volume 39

Testing Reading through Summary: Investigating summary completion tasks for assessing reading comprehension ability Lynda Taylor (2013)

Testing Reading through Summary explores the use of summary tasks as an effective means of assessing reading comprehension ability. It focuses in particular on text-removed summary completion as a task type that offers a way of addressing more directly the reader’s mental representation of text for reading assessment purposes.

The volume describes a series of empirical studies that investigated the development of text-removed summary completion tasks, their trialling and validation with results from an independent measure of reading ability. Findings from the project suggested that it is possible to develop a satisfactory summary of a text which will be consistent with most readers’ mental representation if their reading of the text is adequately contextualised within some purposeful activity.

Key features of the book include:

  • an in-depth discussion of the nature of reading comprehension and approaches to assessing reading comprehension ability
  • a comprehensive empirical report and practical guidance on the development, trialling and validation of summary completion tasks
  • fresh insights into current knowledge and understanding of the assessment of reading ability.

This volume will be a valuable resource for those working professionally in the field of reading assessment such as key personnel in examination agencies and those with an academic interest in language testing/examining. It will also be a useful resource for postgraduate students of language testing and for practitioners, i.e. teachers, teacher educators, curriculum developers, materials writers, and anyone seeking to better understand the nature of reading comprehension ability and how it can be assessed most effectively.

Volume 38 — Cambridge English Exams – The First Hundred Years (Hawkey and Milanovic 2013)

Front cover of Studies in Language Testing – Volume 38

Cambridge English Exams – The First Hundred Years A history of English language assessment from the University of Cambridge 1913-2013 Roger Hawkey and Michael Milanovic (2013)

The first Cambridge English examination for non-native speakers was taken by three candidates in 1913. Today, the exams are taken by nearly four million people a year in 130 countries and cover a wide range of needs, from English for young learners to specific qualifications for university entrance and professional use.  Throughout their history, the Cambridge English exams have been designed to meet the changing needs of learners, teachers, universities, employers and official bodies, and to deliver educational and social benefits. They have benefited from - and contributed to - research in education, language learning and assessment to ensure that they offer valid, reliable and fair qualifications. This book traces the history of the exams through their first hundred years, setting them in the context of wider educational and academic developments. The authors pay particular attention to the contribution of the dedicated individuals in Cambridge and around the world who have contributed to the success of the exams and to their positive educational impact. It will be of interest to anyone interested in language teaching and assessment, applied linguistics or educational history, and to the thousands of people who are part of the wider Cambridge English network.

Volume 37 — Measured Constructs: A history of Cambridge English language examinations 1913-2012 (Weir, Vidaković and Galaczi 2013)

Measured constructs: a history of cambridge english language examinations 1913-2012 cyril j weir, ivana vidaković and evelina d galaczi (2013).

This volume sheds light on how approaches to measuring English language ability evolved worldwide and at Cambridge over the last 100 years.  The volume takes the reader from the first form of the Certificate of Proficiency in English offered to three candidates in 1913, a serendipitous hybrid of legacies in language teaching from the previous century, up to the current Cambridge approach to language examinations, where the language construct to be measured is seen as the product of the interactions between a targeted cognitive ability based on an expert user model, a highly specified context of use and a performance level based on explicit and appropriate criteria of description.

This volume:

  • chronicles the evolution of constructs in English language teaching and assessment over the last century
  • provides an accessible and systematic analysis of changes in the way constructs were measured in Cambridge English exams from 1913- 2012
  • includes copies of past Cambridge English exams, from the original exams to the current ones, as well as previously unpublished archive material.

Measured Constructs is a rich source of information on how changes in language pedagogy, together with wider socio-economic factors, have shaped the development of English language exams in Cambridge over the last century.  As such, it will be of considerable interest to researchers, practitioners and graduate students in the field of language assessment.  This volume complements previous historical volumes in the series on the development of Cambridge English exams, as well as titles which investigate language ability constructs underlying current Cambridge English exams.

Volume 36 — Exploring Language Frameworks: Proceedings of the ALTE Kraków Conference, July 2011 (Galaczi and Weir 2013)

Front cover of Studies in Language Testing – Volume 36

Exploring Language Frameworks: Proceedings of the ALTE Kraków Conference, July 2011 Edited by Evelina D Galaczi and Cyril J Weir (2013)

This volume explores the role of language frameworks in social, educational and practical contexts. It brings together a collection of 21 edited papers based on presentations given at the 4th International Conference of the Association of Language Testers in Europe (ALTE) held in Kraków in July 2011. The selected papers focus on several core strands addressed during the conference. Section one deals with frameworks in social contexts and focuses on their role in migration and multilingual policy and practice. Section two addresses the use of frameworks in educational contexts and addresses issues such as defining an inclusive framework for languages, the use of frameworks in test and course development and their role in guiding test users. Section three focuses on practical issues associated with the application of frameworks and presents studies associated with rating scales, the use of frameworks in test development and validation, and the role of statistical procedures as part of quality assurance.

  • insights into the influence of language frameworks on social policy and practice
  • up-to-date information on the application of frameworks in a variety of learning and teaching contexts worldwide 
  • accounts of recent projects involving the practical role of frameworks in addressing assessment issues.

With its broad coverage of key issues and combination of theoretical insights and practical advice, this volume is a valuable reference work for academics, employers and policy-makers in Europe and beyond. It is also a useful resource for postgraduate students of language testing and for practitioners, and anyone else seeking to understand the policies, procedures and challenges encountered in the application of language frameworks.

Volume 35 — Examining Listening: Research and practice in assessing second language listening (Geranpayeh and Taylor 2013)

Front cover of Studies in Language Testing – Volume 35

Examining Listening: Research and practice in assessing second language listening Edited by Ardeshir Geranpayeh and Lynda Taylor (2013)

This volume develops a theoretical framework for validating tests of second language listening ability. The framework is then applied through an examination of the tasks in Cambridge English listening tests from a number of different validity perspectives that reflect the socio-cognitive nature of any assessment event. The authors show how an understanding and analysis of the framework and its components can assist test developers to operationalise their tests more effectively, especially in relation to the key criteria that differentiate one proficiency level from another.

  • an up-to-date review of the relevant literature on assessing listening
  • an accessible and systematic description of the different proficiency levels in second language listening
  • a comprehensive and coherent basis for validating tests of listening.

This volume is a rich source of information on all aspects of examining listening ability. As such, it will be of considerable interest to examination boards who wish to validate their own listening tests in a systematic and coherent manner, as well as to academic researchers and graduate students in the field of language assessment more generally. This is a companion volume to the previously published Examining Writing (2007) , Examining Reading (2009)and Examining Speaking (2011).

"Geranpayeh and Taylor have put together a collection that will undoubtedly become a significant addition to the literature on a still very under-represented skill." Luke Harding (2015), Language Testing 31, 121-124.

Volume 34 — IELTS Collected Papers 2: Research in reading and listening assessment (Taylor and Weir 2012)

Ielts collected papers 2: research in reading and listening assessment edited by lynda taylor and cyril j weir (2012).

IELTS (International English Language Testing System) serves as a high-stakes proficiency test to assess the English language skills of international students wishing to study, train or work in English-speaking environments. The test has been regularly revised in light of findings from ongoing research and validation studies to ensure that it remains a valid and reliable measure. This volume brings together a set of research studies conducted between 2005 and 2010, sponsored under the auspices of the British Council/IELTS Australia Joint-funded Research Program, which provides annual grant funding to encourage research activity among IELTS test stakeholders around the world. The eight studies – four on reading and four on listening assessment – provide valuable test validity evidence and directly inform the continuing development of the IELTS Reading and Listening tests. The volume chronicles the evolution of the Reading and Listening tests in ELTS and IELTS from 1980 to the present day. It explains the rationale for revising these tests at various points in their history and the role played in this by research findings. The editors comment on the specific contribution of each study in this volume to the ongoing process of IELTS Reading and Listening test design and development. This is a companion volume to the previously published IELTS Collected Papers on IELTS speaking and writing assessment. It will be of particular value to language testing researchers interested in IELTS as well as to institutions and professional bodies who use IELTS test scores. It will also be relevant to students, lecturers and researchers working more broadly in the field of English for Academic Purposes.

Volume 33 — Aligning Tests with the CEFR: Reflections on using the Council of Europe’s draft Manual (Martyniuk 2010)

Aligning tests with the cefr: reflections on using the council of europe’s draft manual edited by waldemar martyniuk (2010).

This volume contains 12 case studies that piloted the Council of Europe’s preliminary Manual for Relating Language Examinations to the Common European Framework of Reference for Languages (CEFR) , released in 2003. The case studies were presented at a 2-day colloquium held in Cambridge in December 2007, an event which helped to inform the Manual revision project during 2008/2009. As well as describing their studies and reporting on their findings, contributors to the volume reflect and comment on their experience of using the draft Manual. A clear and comprehensive introductory chapter explains the development of the CEFR and the draft Manual for linking tests, discussing its relevance for the future. The volume will be of particular interest to examination boards, language test developers and educational policy makers, as well as to academic lecturers, researchers and graduate students interested in the principles and practice of aligning tests with the CEFR.  

‘This volume … is another excellent book in the Studies in Language Testing (SiLT) series … This volume of papers will serve as an excellent resource for professionals around the world who wish to learn how to go about the difficult task of aligning their assessments with the CEFR.’ Craig Deville (2012), Language Testing 29 (2), 312–314.

Volume 32 — Components of L2 Reading: Linguistic and processing factors in the reading test performances of Japanese EFL learners (Shiotsu 2010)

Components of l2 reading: linguistic and processing factors in the reading test performances of japanese efl learners toshihiko shiotsu (2010).

This volume investigates the linguistic and processing factors that underpin the reading comprehension performance of Japanese learners of English. It describes a comprehensive and rigorous empirical study to identify the main candidate variables that affect reading performance and to develop appropriate research instruments to investigate these. The study explores the contribution to successful reading comprehension of factors such as syntactic knowledge, vocabulary breadth and reading speed in the second language. Key features of the book include: an up-to-date review of the literature on the development and assessment of L1 and L2 reading ability; practical guidance on how to investigate the L2 reading construct using multiple methodologies; and fresh insights into interpreting test data and statistics, and into understanding the nature of L2 reading proficiency. This volume will be a valuable resource for academic researchers and postgraduate students interested in investigating reading comprehension performance, as well as for examination board staff concerned with the design and development of reading assessment tools. It will also be a useful reference for curriculum developers and textbook writers involved in preparing syllabuses and materials for the teaching and learning of reading.

Volume 31 — Language Testing Matters: Investigating the wider social and educational impact of assessment – Proceedings of the ALTE Cambridge Conference, April 2008 (Taylor and Weir 2009)

Language testing matters: investigating the wider social and educational impact of assessment – proceedings of the alte cambridge conference, april 2008 edited by lynda taylor and cyril j weir (2009).

This volume explores the social and educational impact of language testing and assessment by bringing together a collection of 20 edited papers given at the 3rd international conference of the Association of Language Testers in Europe (ALTE). Section One considers new perspectives on testing for specific purposes, including the role played by language assessment in the aviation industry, the legal system, and migration and citizenship policy. Section Two contains insights on testing policy and practice in the context of language teaching and learning in different parts of the world, including Africa, Europe, North America and Asia. Section Three offers reflections on the impact of testing among differing stakeholder constituencies, such as the individual learner, educational authorities, and society in general. With its broad coverage of key issues, this volume is a valuable reference work for academics, employers and policy makers in Europe and beyond. It is also a useful resource for postgraduate students of language testing and for practitioners, i.e. teachers, teacher educators, curriculum developers and materials writers.

Volume 30 — Examining Speaking: Research and practice in assessing second language speaking (Taylor 2011)

Examining speaking: research and practice in assessing second language speaking edited by lynda taylor (2011).

This edited volume develops a theoretical framework for validating tests of second language speaking ability. The framework is then applied through an examination of the tasks in Cambridge English Speaking tests from a number of different validity perspectives that reflect the socio-cognitive nature of any assessment event. The chapter authors show how an understanding and analysis of the framework and its components can assist test developers to operationalise their Speaking tests more effectively, especially in relation to the key criteria that differentiate one proficiency level from another. As well as providing an up-to-date review of relevant literature on assessing speaking, the volume also offers an accessible and systematic description of the different proficiency levels in second language speaking, and a comprehensive and coherent basis for validating tests of speaking. The volume will be of interest to examination boards who wish to validate their own Speaking tests in a systematic and coherent manner, as well as to academic researchers and students in the field of language assessment more generally.

“This edited volume provides useful information on how to apply a socio-cognitive theoretical framework of validity by illustrating research on Cambridge ESOL exams…[it] provides a broad picture with constructive examples for future researchers who want to apply this validity framework.” Youngshin Chi (2013), Language Assessment Quarterly 10, 476-479

Volume 29 — Examining Reading: Research and practice in assessing second language reading (Khalifa and Weir 2009)

Examining reading: research and practice in assessing second language reading hanan khalifa and cyril j weir (2009).

This volume develops a theoretical framework for validating tests of second language reading ability. The framework is then applied through an examination of tasks in Cambridge English Reading tests from a number of different validity perspectives that reflect the socio-cognitive nature of any assessment event. The authors show how an understanding and analysis of the framework and its components can assist test developers to operationalise their tests more effectively. As well as providing an up-to-date review of relevant literature on assessing reading, it also offers an accessible and systematic description of the key criteria that differentiate one proficiency level from another when assessing second language reading. The volume will be of interest to examination boards who wish to validate their own reading tests in a systematic and coherent manner, as well as to academic researchers and students in the field of language assessment more generally.

‘The book offers the field another splendid exposition on second language (L2) reading. This work is unique, however, in that it was written by two scholars who are quite familiar with the Cambridge suite of examinations, and they make extensive use of their knowledge of these tests to demonstrate how the Cambridge ESOL examinations implement theory and research in practice … This volume represents an important contribution to the field in terms of both theory and practice, its timeliness regarding several topics (e.g. alignment with the CEFR, computerized testing, among others), and its appeal to and relevance for multiple audiences.’ Craig Deville (2011), The Modern Language Journal 95, 334–335. SiLT 29 was nominated as a runner-up in the prestigious Sage/ILTA 2012 award for the best book on language testing.

Volume 28 — Examining FCE and CAE: Key issues and recurring themes in developing the First Certificate in English and Certificate in Advanced English exams (Hawkey 2009)

Examining fce and cae: key issues and recurring themes in developing the first certificate in english and certificate in advanced english exams roger hawkey (2009).

This volume examines two of the best-known Cambridge English examinations – Cambridge English: First , also known as First Certificate in English (FCE) and Cambridge English: Advanced , also known as Certificate in Advanced English (CAE) . It starts with the introduction of FCE (then the Lower Certificate in English) in 1939 and traces subsequent developments, including the introduction of FCE in 1975 and of CAE in 1991, as well as the regular projects to modify and update both tests. Key issues addressed are: test constructs; proficiency levels; principles and practice in test development, validation and revision; organisation and management; and stakeholders and partnerships. The book includes a unique set of facsimile copies of FCE and CAE test versions, from the original tests in 1939 and 1991 through various revision projects to the updated formats of 2008. The volume will be of interest to language testing researchers, academic lecturers, postgraduate students and educational policy makers, as well as to teachers, directors of studies, school owners and other stakeholders involved in preparing students for the Cambridge exams. This title complements previous historical volumes on C PE, BEC, CELS and IELTS .

Volume 27 — Multilingualism and Assessment: Achieving transparency, assuring quality, sustaining diversity – Proceedings of the ALTE Berlin Conference, May 2005 (Taylor and Weir 2008)

Multilingualism and assessment: achieving transparency, assuring quality, sustaining diversity – proceedings of the alte berlin conference, may 2005 edited by lynda taylor and cyril j weir (2008).

This collection of edited papers, based on presentations given at the 2nd ALTE Conference, explores the impact of multilingualism on language testing and assessment. The 20 papers consider ways of describing and comparing language qualifications to establish common levels of proficiency, balancing the need to set shared standards and ensure quality, and at the same time sustain linguistic diversity. The contributions come from authors within and beyond Europe and address substantive issues in assessing language ability today. Key features of the volume include: advice on quality management processes in test development and administration; discussion of the role of language assessment in migration and citizenship; and guidance on linking examinations to the CEFR, including some case studies. This volume is a valuable reference for academics and policy makers both within Europe and beyond, as well as a useful resource for practitioners seeking to define language proficiency levels in relation to the CEFR and similar frameworks.

“Overall the book provides well-selected papers with wide-ranging subject matters from the European community, which allows a glance into the challenging tasks the member countries are facing as they are adjusting to the concept of shared standards in language proficiency. The book will serve as timeless reference for testing professionals as it chronicles the tasks that have to be undertaken when 46 countries are involved in a task of this magnitude … The papers are important not only for the European member organisations (and the five observing countries: Canada, the Holy See, Japan, Mexico and the United States) but also for the assessment community in general, because they illustrate that with a clear mission and with dedicated researchers guided globalization can be beneficial to all.” Zsuzsa Cziraky Londe (2010), Language Assessment Quarterly 7 (3), 280–283.

Volume 26 — Examining Writing: Research and practice in assessing second language writing (Shaw and Weir 2007)

Examining writing: research and practice in assessing second language writing stuart d shaw and cyril j weir (2007).

This volume describes the theory and practice of the Cambridge English approach to assessing second language writing ability. A comprehensive test validation framework is used to examine the tasks in Cambridge English Writing tests from a number of different validity perspectives that reflect the socio-cognitive nature of any assessment event. The authors show how an understanding and analysis of the framework and its components can assist test developers to operationalise their tests more effectively. As well as providing an up-to-date review of relevant literature on assessing writing, it also offers an accessible and systematic description of the different proficiency levels in second language writing. The volume will be of interest to examination boards who wish to validate their own Writing tests in a systematic and coherent manner, as well as to academic researchers and students in the field of language assessment more generally. ‘… it should be of interest to a wider audience as well for at least two reasons: (1) it provides a coherent, up-to-date summary of research on writing as a phenomenon in itself, as well as on the assessment of writing; and (2) it presents a great deal of practical information based on solid research that will be helpful in assisting others who are designing, evaluating, or wishing to improve upon their own assessment practices.’ Sara Cushing Weigle (2010), Language Testing 27 (1), 141–144.

Volume 25 — IELTS Washback in Context: Preparation for academic writing in higher education (Green 2007)

Ielts washback in context: preparation for academic writing in higher education anthony green (2007).

Based upon a PhD dissertation completed in 2003, this volume reports an empirical study to investigate the washback of the IELTS Writing subtest on English for Academic Purposes (EAP) provision. The study examines dedicated IELTS preparation courses alongside broader programmes designed to develop the academic literacy skills required for university study. Using a variety of data collection methods and analytical techniques, the research explores the complex relationship that exists between teaching and learning processes and their outcomes. The role of IELTS in EAP provision is evaluated, particularly in relation to the length of time and amount of language support needed by learners to meet minimally acceptable standards for English-medium tertiary study. This volume will be of direct interest to providers and users of general proficiency and EAP tests, as well as academic researchers and graduate students interested in investigating test washback and impact. It will also be relevant to teachers, lecturers and researchers concerned with the development of EAP writing skills.

Volume 24 — Impact Theory and Practice: Studies of the IELTS test and Progetto Lingue 2000 (Hawkey 2006)

Impact theory and practice: studies of the ielts test and progetto lingue 2000 roger hawkey (2006).

This book describes two recent case studies to investigate test impact in specific educational contexts: one analyses the impact of IELTS (International English Language Testing System) , while the second focuses on a major national language teaching reform programme introduced by the Ministry of Education in Italy. With its combination of theoretical overview and practical advice, this volume is a useful manual on how to conduct impact studies and will be of particular interest to language test researchers and students of language testing. It will also be relevant to those who are concerned with the process of curriculum and examination reform.

Volume 23 — Assessing Academic English: Testing English proficiency, 1950–1989 – the IELTS solution (Davies 2008)

Assessing academic english: testing english proficiency, 1950–1989 – the ielts solution alan davies (2008).

This volume presents an authoritative account of academic language proficiency testing in the UK. It chronicles the early development and use of the English Proficiency Test Battery (EPTB) in the 1960s, followed by the creation and implementation of the revolutionary English Language Testing Service (ELTS) in the 1970s and 1980s, and the introduction of the International English Language Testing System (IELTS) in 1989. The book offers a coherent socio-cultural analysis of the changes in language testing and an explanation of why history matters as much in this field as elsewhere. It discusses the significant factors which impact on language test design, development, implementation and revision, and presents historical documents relating to the language tests discussed in the volume, including facsimile copies of original test versions. The volume will be of interest to language test developers and policy makers, as well as teachers, lecturers and researchers interested in assessing English for Academic Purposes (EAP) and in the role played by ELTS and IELTS . 

Volume 22 — The Impact of High-stakes Testing on Classroom Teaching: A case study using insights from testing and innovation theory (Wall 2005)

The impact of high-stakes testing on classroom teaching: a case study using insights from testing and innovation theory dianne wall (2005).

This volume gives an account of one of the first data-based studies of examination ‘washback’. Through a detailed analysis of the impact of examination reform in one specific educational setting, it considers the effects of a test which was meant to serve as a lever for change, and describes how the intended outcome was shaped by factors in the test itself, as well as by features of the context, teachers and learners. The volume provides a helpful model for researching washback and impact as well as practical guidelines for the planning and management of change within an educational context. It is of particular relevance to all who are involved in the process of curriculum and examination reform, and to academic researchers, university lecturers, graduate students and practising teachers.

Volume 21 — Changing Language Teaching through Language Testing: A washback study (Cheng 2005)

Changing language teaching through language testing: a washback study liying cheng (2005).

This volume presents a study of how the introduction in 1996 of a high-stakes public examination impacted on classroom teaching and learning in Hong Kong secondary schools. The washback effect was observed among different stakeholder groups within the local educational context, and also in terms of teachers’ attitudes, teaching content and classroom interactions. The volume is of particular relevance to language test developers and researchers interested in the consequential validity of tests, as well as to teachers, curriculum designers, policy makers and others concerned with the interface between language testing and teaching practices.

Volume 20 — Testing the Spoken English of Young Norwegians: A study of test validity and the role of ‘smallwords’ in contributing to pupils’ fluency (Hasselgreen 2004)

Testing the spoken english of young norwegians: a study of test validity and the role of ‘smallwords’ in contributing to pupils’ fluency angela hasselgreen (2004).

This volume reports on a study to validate a test of spoken English for secondary school pupils in Norway. The study included a corpus-based investigation of how conversational fillers or ‘smallwords’ contribute to spoken fluency. Findings from this work informed the development of rating scale descriptors for assessing fluency levels. The volume will be of particular interest to those concerned with the design and validation of spoken language tests, as well as those interested in features of spoken communication and in how classroom practice can help develop learners’ fluency.

Volume 19 — IELTS Collected Papers: Research in speaking and writing assessment (Taylor and Falvey 2007)

Ielts collected papers: research in speaking and writing assessment edited by lynda taylor and peter falvey (2007).

This book brings together 10 research studies conducted between 1995 and 2001 under the auspices of the British Council/IELTS Australia Joint-funded Research Program. The studies – four on speaking and six on writing assessment – provided valuable test validity evidence and directly informed the revised IELTS Speaking and Writing tests introduced in 2001 and 2005. Volume 19 chronicles the evolution of the Writing and Speaking tests in ELTS/IELTS from 1980 to the present day and discusses the role of research in their development. In addition, it evaluates a variety of research methods to provide helpful guidance for novice and less experienced researchers. This collection of studies will be of particular value to language testing researchers interested in IELTS as well as to institutions and professional bodies who make use of IELTS test scores; it will also be relevant to students, lecturers and researchers working more broadly in the field of English for Academic Purposes. “It is really a book which anyone concerned with performance testing should read and benefit from. At the very least, the literature reviews under each topic and the detailed explanations, then critique, of methods are excellent contributions to the field.” Wayne Rimmer (2010) Modern English Teacher 19 (1), 91–92.

Volume 18 — European Language Testing in a Global Context: Proceedings of the ALTE Barcelona Conference July 2001 (Milanovic and Weir 2004)

European language testing in a global context: proceedings of the alte barcelona conference july 2001 edited by michael milanovic and cyril weir (2004).

The ALTE Conference, European Language Testing in a Global Context, was held in Barcelona in 2001 in support of the European Year of Languages. The contents of this volume represent a small subset of the many presentations made at that event and papers were selected to provide a flavour of the issues that the conference addressed which included: technical dimensions of language testing; matters of fairness and ethics in assessment; aspects of education and language policy in the European context; and reports of recently completed research studies and work in progress.

Volume 17 — Issues in Testing Business English: The revision of the Cambridge Business English Certificates (O’Sullivan 2006)

Issues in testing business english: the revision of the cambridge business english certificates barry o’sullivan (2006).

This book explores the testing of language for specific purposes (LSP) from a theoretical and practical perspective, with a particular focus on the testing of English for business purposes. A range of tests – both past and present – is reviewed, and the development of Business English testing at Cambridge English is discussed. The description of the revision of Cambridge English: Business Certificates , also known as Business English Certificates (BEC) , in 2002 forms a major part of the book and offers a unique insight into an approach to large-scale ESP test development and revision. The volume will be of particular relevance to test developers and researchers interested in language testing for specific purposes and contexts of use; it will also be of interest to ESP teachers, especially those teaching English for business, as well as to lecturers and postgraduates working in the field of LSP.

Volume 16 — A Modular Approach to Testing English Language Skills: The development of the Certificates in English Language Skills (CELS) examinations (Hawkey 2004)

A modular approach to testing english language skills: the development of the certificates in english language skills (cels) examinations roger hawkey (2004).

This volume documents in some detail the development of the Cambridge English Certificates in English Language Skills (CELS), a suite of modular examinations first offered in 2002. The book traces the history of various important English language exams offered by UCLES and other examination boards which significantly influenced the development of CELS including: the Communicative Use of English as a Foreign Language (CUEFL) exams; the Certificates in Communicative Skills in English (CCSE); the English language tests of reading and writing produced by the University of Oxford Delegacy of Local Examinations; and the Oral English exams offered by the Association of Recognised English Language Schools (ARELS) Examinations Trust.

Volume 15 — Continuity and Innovation: Revising the Cambridge Proficiency in English Examination 1913–2002 (Weir and Milanovic 2003)

Front cover of Studies in Language Testing – Volume 15

Continuity and Innovation: Revising the Cambridge Proficiency in English Examination 1913–2002 Edited by Cyril Weir and Michael Milanovic (2003)

This volume documents in some detail the most recent revision of Cambridge English: Proficiency , also known as Certificate of Proficiency in English (CPE) , which took place from 1991 to 2002. CPE is the oldest of the Cambridge suite of English as a Foreign Language (EFL) examinations and was originally introduced in 1913. Since that time the test has been regularly revised and updated to bring it into line with current thinking in language teaching, applied linguistics and language testing theory and practice. The volume provides a full account of the revision process, the questions and problems faced by the revision teams, and the solutions they came up with. It is also an attempt to encourage in the public domain greater understanding of the complex thinking, processes and procedures which underpin the development and revision of all the Cambridge English tests, and as such it will be of interest and relevance to a wide variety of readers.

“An invaluable case book for training language testers and teachers … Makes explicit the developing philosophy of good testing practice … With its wealth of illustrative examples and detailed statistics, this study clearly presents an exceptional case study of a well-managed and professionally-serviced English language test … An important study, showing the possibilities of good language testing.” Bernard Spolsky (2004) ELT Journal, 58 (3), 305–309.

Volume 14 — A Qualitative Approach to the Validation of Oral Language Tests (Lazaraton 2002)

A qualitative approach to the validation of oral language tests anne lazaraton (2002).

Language testers have generally come to recognise the limitations of traditional statistical methods for validating oral language tests, and have begun to consider more innovative approaches to test validation which can illuminate the assessment process itself, rather than just assessment outcomes (i.e. test scores). One such approach is conversation analysis (or CA), a rigorous empirical methodology developed by sociologists, which employs inductive methods in order to discover and describe the recurrent, systematic properties of conversation. This book aims to provide language testers with a background in the conversation analytic framework, and a fuller understanding of what is entailed in using conversation analysis in the specific context of oral language test validation.

“… this book provides an excellent, and clearly written, introduction to the use of discourse analysis, especially CA, in examining the functioning of oral language tests … I would recommend this book to teachers or test developers who might be developing oral language tests as well as those who are intending to carry out research using discourse analytic techniques. Finally, also, it must be said, the book was enjoyable to read; in particular I found Lazaraton’s discussion of the literature on oral interview research to be well-organised and clear, and her discussion of CA theory to be extremely accessible.” Annie Brown (2005) Language Assessment Quarterly 2 (4), 309–313.

Volume 13 — The Equivalence of Direct and Semi-direct Speaking Tests (O’Loughlin 2001)

The equivalence of direct and semi-direct speaking tests kieran o’loughlin (2001).

This book documents a comparability study of direct (face-to-face) and semi-direct (language laboratory) versions of the Speaking component of the access : test, an English language test designed in the 1990s by the Language Testing Research Centre (University of Melbourne) as part of the selection process for immigration to Australia. The study gathered a broad range of quantitative and qualitative evidence to investigate the issue of test equivalence, and this multi-layered approach yields a complex and richly textured perspective on the comparability of the two kinds of Speaking tests. The findings have important implications for the use of direct and semi-direct Speaking tests in various high-stakes contexts such as immigration and university entrance. As such, the book will be of interest to policy makers and administrators as well as language teachers and language testing researchers.

‘... this book makes an important contribution to the language testing literature … For its insights and multifaceted approach to examining test equivalence, it is a valuable resource to language test developers, researchers, graduate students, and even language programs considering using either of these test formats ... a very readable tale of two tests and the complexity needed to unravel what actually happens in them.’ Lindsay Brooks (2006) Language Assessment Quarterly 3 (4), 369–373.

Volume 12 — An Empirical Investigation of the Componentiality of L2 Reading in English for Academic Purposes (Weir, Huizhong and Yan 2000)

An empirical investigation of the componentiality of l2 reading in english for academic purposes edited by cyril j weir, yang huizhong and jin yan (2000).

This volume describes the development and validation of an advanced level test for evaluating expeditious (skimming, search reading and scanning) and careful EAP reading abilities at tertiary level in China. It reports on the methodological procedures which led to the development of the test and discusses the results of empirical investigations carried out to establish its validity both a priori and a posteriori . It is of particular interest and value to teachers, researchers and test developers.

“... this book is a systematic presentation of the authors’ dual-purpose pioneering work in EFL reading. On the one hand, they focus on the research question of the componentiality of academic EFL reading ... On the other hand, the researchers’ experimental work has rewarded them with a unique academic EFL reading test, whose development process is a wonderful model for other test developers to follow.” Ning Chen (2006) Language Assessment Quarterly 3 (1), 81–86.

Volume 11 — Experimenting with Uncertainty: Essays in honour of Alan Davies (Elder, Brown, Grove, Hill, Iwashita, Lumley, McNamara, O’Loughlin 2001)

Experimenting with uncertainty: essays in honour of alan davies edited by c elder, a brown, e grove, k hill, n iwashita, t lumley, t mcnamara, k o’loughlin (2001).

This festschrift brings together 28 invited papers surveying the state of the art in language testing from a perspective which combines technical and broader applied linguistics insights. The papers, written by key figures in the field of language testing, cover issues ranging from test construct definition to the design and application of language tests, including their importance as a means of exploring larger issues in language teaching, language learning and language policy. The volume locates work in language assessment in a context of social, political and ethical issues at a time when testing is increasingly expected to be publicly accountable.

"The breadth of perspectives of [Experimenting with Uncertainty: Essays in honour of Alan Davies, Studies in Language Testing 11, Elder et al (Eds) (2001), CUP/UCLES] is wide enough, providing critically informative commentaries on the issues that language testers should be aware of, particularly in these times when assessment and accountability are increasingly valued in overall circles of education as well as the field of language testing … Providing a readable introduction … this book will guide … readers in how to grapple with thorny issues that language testing researchers may encounter in their professional career.” Hyeong-Jong Lee (2005) Language Testing 22 (4), 533–545.

Volume 10 — Issues in Computer-Adaptive Testing of Reading Proficiency: Selected papers (Chalhoub-Deville 1999)

Issues in computer-adaptive testing of reading proficiency: selected papers edited by micheline chalhoub-deville (1999).

This volume is an important resource for those interested in research on and development of computer-adaptive (CAT) instruments for assessing the receptive skills, mainly reading. It includes selected papers from a conference on the computer-adaptive testing of reading held in Bloomington, Minnesota, in 1996, as well as a number of specially written papers.

"For those interested in developing and appreciating CAT for reading measurement, the volume [Issues in Computer-Adaptive Testing of Reading Proficiency, Studies in Language Testing 10, Chalhoub-Deville (Ed.) (1999), CUP/UCLES] has, to date, had no parallel in its value as an excellent resource book.” Jungok Bae (2005) Language Assessment Quarterly 2 (2), 169–173.

“[T]he chapters in this book represent state-of-the-art thinking in computer-adaptive language testing. The book will remain a key volume in the field for many years to come.” Glenn Fulcher (2000) Language Testing 17 (3), 361–367.

Volume 09 — Fairness and Validation in Language Assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida (Kunnan 2000)

Fairness and validation in language assessment: selected papers from the 19th language testing research colloquium, orlando, florida edited by antony john kunnan (2000).

Fairness of language tests and testing practices has always been a concern among test developers and test users. In the past decade educational and language assessment researchers have begun to focus directly on fairness and related matters such as test standards, test bias and equity and ethics for testing professionals. The 19th annual Language Testing Research Colloquium held in 1997 in Orlando, Florida, brought this overall concern into sharp focus by having ‘Fairness in Language Testing’ as its theme. The conference presentations and discussions attempted to understand the concept of fairness, define the scope of the concept and connect it with the concept of validation of test score interpretation. The papers in this volume offer a first introduction to fairness and validation in the field of language assessment.

Volume 08 — Learner Strategy Use and Performance on Language Tests: A structural equation modeling approach (Purpura 1999)

Front cover of Studies in Language Testing – Volume 08

Learner Strategy Use and Performance on Language Tests: A structural equation modeling approach James E Purpura (1999)

This volume investigates the relationship between learner strategy use and performance on second language tests, by examining the construct validity of two questionnaires designed within a model of information processing that measures test takers’ self-reported cognitive and metacognitive strategy use. The book investigates how learner strategy use influences test performance, and how high performers use strategies differently from low performers.

Volume 07 — Dictionary of Language Testing (Davies, Brown, Elder, Hill, Lumley and McNamara 1999)

Dictionary of language testing alan davies, annie brown, cathie elder, kathryn hill, tom lumley and tim mcnamara (1999).

This volume constitutes a valuable resource for anyone seeking a better understanding of the terminology and concepts used in language testing. It contains some 600 entries, each listed under a headword with extensive cross-referencing and suggestions for further reading. The selection of headwords is based on advice from specialists in language testing around the world, combined with the scanning of current textbooks in this field and of dictionaries and encyclopaedias in adjacent fields (e.g. psychometrics, applied linguistics, statistics).

"Multilingual Glossary of Language Testing Terms (Studies in Language Testing 6, ALTE [1998], CUP and UCLES) and Dictionary of Language Testing (Studies in Language Testing 7, Davies et al [1999], CUP/UCLES) are monumental works in the field of language testing.” Yoshinori Watanabe (2005) Language Assessment Quarterly 2 (1), 69–75. “... the book can act as a specific point of reference for language testing terminology and concepts, and students will find it increasingly useful as their understanding within the field develops.” Roger Barnard (2000) Modern English Teacher 9 (3), 89–90.

Volume 06 — Multilingual Glossary of Language Testing Terms (ALTE Members 1998)

Multilingual glossary of language testing terms prepared by alte members (1998).

A multilingual glossary has a particularly significant role to play in encouraging the development of language testing in less widely taught languages by establishing terms which may be new alongside their well-known equivalents in the commonly used languages. The glossary contains entries in 10 languages: Catalan, Danish, Dutch, English, French, German, Irish, Italian, Portuguese and Spanish. This volume will be of use to many working in the context of European languages who are involved in testing and assessment.

“… exploration of the MG reveals it, in my opinion, to be of real value in its own right, both as a working glossary of language testing terms, and, perhaps more importantly, as an invaluable aid to speakers of the ten represented languages … represents an invaluable resource for the tester and student of testing alike.” Barry O’Sullivan (2002) Applied Linguistics 23 (2), 273–275.

Volume 05 — Verbal Protocol Analysis in Language Testing Research: A handbook (Green 1998)

Verbal protocol analysis in language testing research: a handbook alison green (1998).

Verbal protocol analysis (VPA) is a methodology that is being used extensively by researchers. Recently, individuals working in the area of testing, and in language testing in particular, have begun to appreciate the roles VPA might play in the development and evaluation of assessment instruments. This book aims to provide potential practitioners of VPA with the background to the technique and a good understanding of what is entailed in using VPA in the context of language testing and assessment. Tutorial exercises are presented which enable the reader to try out each of the different steps involved in VPA.

"The book is successful in providing a practical guide for graduate students and researchers wishing a better understanding of VPA in language testing … it fulfils the need for a basic introduction to the application of VPA … a stimulating guide for researchers interested in language testing.” Abdoljavad Jafarpur (1999) Language Testing 16 (4), 483–486.

Volume 04 — The Development of IELTS: A study of the effect of background knowledge on reading comprehension (Clapham 1996)

The development of ielts: a study of the effect of background knowledge on reading comprehension caroline clapham (1996).

This book investigates the ESP claim that tertiary level ESL students should be given reading proficiency tests in their own academic subject areas, and studies the effect of background knowledge on reading comprehension. It is set against a background of recent research into reading in a first and second language, and emphasises the impact schema theory has had on this. The book is a useful resource for those involved with IELTS and others interested in the testing of English for academic purposes. "Caroline Clapham has written a major, seminal book. She has examined a dangerous field of landmines, detected them, and disarmed them. This book will serve as a map of that minefield for years to come. Higher-education language departments … who are seriously considering special-fields testing should read this book carefully.” Fred Davidson (1998) Language Testing 15 (2), 289–301.

Volume 03 — Performance Testing, Cognition and Assessment: Selected papers from the 15th Language Testing Research Colloquium, Cambridge and Arnhem (Milanovic and Saville 1996)

Performance testing, cognition and assessment: selected papers from the 15th language testing research colloquium, cambridge and arnhem edited by michael milanovic and nick saville (1996).

This book contains a selection of research papers presented at the 15th Annual Language Testing Research Colloquium (LRTC). The Colloquium was jointly hosted by the University of Cambridge Local Examinations Syndicate (UCLES) in Cambridge and CITO in Arnhem, the Netherlands. At the Cambridge venue, the papers were presented on the themes of performance testing, and at Arnhem they covered aspects of communication in relation to cognition and assessment. A selection of papers has been made in order to achieve a balanced coverage of these themes.

“The book thus provides a valuable resource for readers interested in a variety of approaches to investigating and understanding L2 performance assessment … a useful collection of research summaries and a source for relevant ideas.” John Norris (1999) Language Testing 16 (1), 121–125.

Volume 02 — Test Taker Characteristics and Test Performance: A structural modeling approach (Kunnan 1995)

Test taker characteristics and test performance: a structural modeling approach anthony john kunnan (1995).

This book investigates the influence of test taker characteristics on test performance in tests of English as a foreign language by exploring the relationships between these two groups of variables. Data from a test taker questionnaire and performance on several tests including Cambridge English: First , also known as First Certificate in English (FCE) , and the TOEFL were used for the study.

Volume 01 — An Investigation into the Comparability of Two Tests of English as a Foreign Language: The Cambridge TOEFL Comparability Study (Bachman, Davidson, Ryan and Choi 1995)

An investigation into the comparability of two tests of english as a foreign language: the cambridge toefl comparability study lyle f bachman, fred davidson, katherine ryan and inn-chull choi (1995).

This book documents a major study, which compares Cambridge English: First , also known as First Certificate in English (FCE) , with the Test of English as a Foreign Language (TOEFL) and investigates similarities in test content, candidature and use.

thesis about language testing

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

Editorial: Frontiers in Language Assessment and Testing

Vahid aryadoust.

1 National Institute of Education, Nanyang Technological University, Singapore, Singapore

Thomas Eckes

2 TestDaF Institute, Ruhr University Bochum, Bochum, Germany

3 Faculty of Science and Engineering, Chuo University, Hachioji, Japan

Although language assessment and testing can be viewed as having a much longer history (Spolsky, 2017 ; Farhady, 2018 ), its genesis as a research field is often attributed to Carroll's ( 1961 ) and Lado's ( 1961 ) publications. Over the past decades, the field has gradually grown in scope and sophistication as researchers have adopted various interdisciplinary approaches to problematize and address old and new issues in language assessment as well as learning. The assessment and validation of reading, listening, speaking, and writing, as well as language elements such as vocabulary and grammar have formed the basis of extensive studies (e.g., Chapelle, 2008 ). Emergent research areas in the field include the assessment of sign languages (Kotowicz et al., 2021 ). In addition, researchers have employed a variety of psychometric and statistical methods to investigate research questions and hypotheses (see chapters in Aryadoust and Raquel, 2019 , 2020 ). The present special issue entitled “Frontiers in Language Assessment and Testing” set out to shed light on these advances and approaches in the field of language assessment.

We received a number of proposals, 13 of which were ultimately accepted for publication in the special issue. Five major themes emerge from the accepted papers: (i) the quantitative perspectives of the history and evolution of language assessment as presented in the scientometric study by Aryadoust et al. , (ii) the issues surrounding the assessment of listening and reading comprehension discussed in the five papers by Spoden et al. ; Cai ; Wallace and Lee ; He and Jiang and Hamada ., (iii) the assessment of speaking and writing proficiency in the two papers by Fan and Yan and Li et al. , (iv) the assessment of sign languages and interpreting competence in the three papers by Rosenburg et al. ; Hall ; and Wang et al. , and (v) the use of advanced quantitative methods presented in the two papers by Koizumi and In'nami and Dunn and McCray .

Quantitative Perspectives of the History and Evolution of Language Assessment: A Scientometric Study

Aryadoust et al. presented an extensive scientometric review of 1,561 articles published in the “core” language assessment journals and 3,175 articles published in the general journals of applied linguistics. Using a document co-citation analysis (DCA) technique, they found that publication in the core journals primarily focused on the assessment of the four language skills (listening, speaking, reading, and writing), while there were fewer papers that examined washback, feedback, and corpus linguistics topics. Similarly, the assessment research in the general journals also focused on the assessment of oral proficiency, vocabulary, writing, reading, and grammar, while fewer publications investigated topics related to cognition and knowledge. These topics included memory, affective schemata, awareness, semantic complexity, and explicit vs. implicit language knowledge. Interestingly, no assessment instruments with entire validity arguments formed the basis for the majority of the studies. This was consistent with findings from previous studies whose authors argued that “collecting such evidence to establish an all-encompassing validity argument is an arduous and logistically complex task” (p. 3). Aryadoust et al. suggested that minimum requirements for examining the validity of tests would include reliability and psychometric evidence to show that the tasks or items functioned properly while evaluating the construct that the test set out to measure.

Assessment of Listening and Reading Comprehension

Spoden et al. investigated the effect of in- and out-of-school language learning opportunities and exposure to media on the correlation between listening and reading skills over time (i.e., the start and end of secondary schooling) in a bilingual pre-tertiary population in Germany. Pre-tertiary populations, as the authors rightly argue, have not drawn the attention of researchers in language assessment as much as adult second language learner populations have. Thus, the study addresses a wide gap in knowledge. Using the latent regression Rasch models and correlation analysis, Spoden et al. found evidence for a converging pattern of growth common between listening and reading. They further reported that this finding was consistent across language learning groups with different backgrounds such as learners with varying experiences in extracurricular English-learning programs. Some theoretical studies have postulated that the auditory modality of the language input in listening could disadvantage L2 learners in listening comprehension compared with visual input in reading, as auditory input is transitory (Aryadoust, 2019 ). In light of this, Spoden et al. 's study indicates that “modality specificity becomes a less important factor to affect comprehension test scores at the end of secondary education in Germany” (p. 4). The authors called for further research to consider how vocabulary and grammar, for example, affect listening and reading, a topic that Cai partially addressed.

Cai investigated the relationship between lexical and semantic knowledge with listening proficiency in academic tests of listening comprehension, using “auditory receptive tasks contextualized in natural discourse” (p. 1). The study used several tasks to operationalize and measure the relationship between listening and language elements, comprising partial dictation, an auditory receptive task, and a standardized listening test, that were administered to a sample of 258 college-level English learners in China. Hierarchical regression analyses revealed that the lexical and semantic knowledge of the participants explained a large proportion of variance (62%) in the listening test scores. The author calls for further studies of the relationship between listening and the language elements investigated to improve the generalizability of the results across different contexts.

In another study of the assessment of listening comprehension, Wallace and Lee investigated the effect of vocabulary size alongside executive functions (EFs) on L2 listening comprehension. The study began from the assumption that language components such as vocabulary and grammar have a significant effect on listening comprehension, yet as language proficiency increases, other factors such as EFs of working memory start to play a crucial role in comprehension. In this study, EFs were operationalized as shifting (“switching attentional focus among mental representations”) and updating (“revising information held in temporary storage”) (p. 1). Using structural equation modeling (SEM), the authors found no main effects or moderation effects of EF, while vocabulary size remained a significant predictor of listening. These results show that, as the authors hypothesized, vocabulary knowledge remains the most important predictor of listening ability, whereas non-linguistic factors such as EF do not contribute to the listening ability of less capable L2 learners.

He and Jiang conducted an extensive review of L2 listening research in 87 studies in peer-reviewed journals and research report series published between 2001 and 2020. The authors used a socio-cognitive validity framework, which consisted of cognitive validity, criterion-related validity scoring validity, context validity, test-taker characteristics, and consequential validity (Weir, 2005 ). By examining the content of the studies based on their coding scheme, the authors identified 13 research themes in relation to the six components of validity in Weir's ( 2005 ) framework. For example, the authors reported that 94.25% of the examined studies focused on context validity, cognitive validity, test-taker characteristics, and scoring validity. In their focus on cognitive ability, however, they included eye tracking and brain activation research. The authors also found that task development, task output/input, and speaker characteristics received “considerable attention” in context validation, whereas there was a dearth of research focusing on consequential and criterion-related validity.

Hamada was interested in the effects of extensive reading instruction on reading comprehension. In Study 1, the author collected previous studies, calculated effect sizes, and grouped them according to their study features. Although instruction was effective overall (Cohen's d = 0.55 [95% confidence interval = 0.39, 0.70]), it was less so when only examining studies that had control and treatment groups of equal reading proficiency ( d = 0.37 [0.24, 0.50]). This suggests the importance of ensuring group equivalency before interpreting instruction effects, which otherwise tend to be overestimated. Study 2 examined whether the estimated instruction effect size from the meta-analysis in Study 1 would be reproducible in an actual classroom study. After analyzing data from 109 learners using propensity score methods, the results suggest that the instruction was effective, with an effect size concurring with that estimated in Study 1. These results from Studies 1 and 2 highlight the importance of evidence-based teaching in the classroom.

Assessment of Speaking and Writing Proficiency

Fan and Yan conducted a narrative review of papers published in two journals in language assessment— Language Assessment Quarterly and Language Testing . A total of 104 papers on speaking assessment were classified under the six types of inferences in an argument-based validation framework (Chapelle, 2008 ). Nearly half of the papers (40.38–48.08%) concerned evaluation, generalization, and/or explanation inferences, with a few (3.85–6.73%) addressing domain description, extrapolation, and/or utilization inferences. The most frequently researched topics included (a) speaking constructs, (b) rater effects, and (c) factors that affect test performance. The studies often used quantitative methods (e.g., analysis of variance, Rasch measurement) to examine questions that would pertain to the evaluation and generalization inferences, and qualitative methods (e.g., discourse analysis, interview) to examine questions that would pertain the explanation inference. The authors conclude that more research on domain description is necessary, particularly in relation to language assessment for specific purposes. They also place importance on taking not only a psycholinguistic but also a sociocultural approach to understand the construct of speaking ability more comprehensively.

Although score differences among subgroups have been examined using differential item functioning (DIF) analysis, it is not always easy to interpret such differences substantively. To address this issue, Li et al. focused on score differences between male and female learners in a standardized writing assessment. The writing prompt was found to favor females, although negligibly so. They investigated the source of this difference using 123 linguistic features. Two cohesion features and four syntactic features correlated significantly with writing scores. As the direction of these correlations was mixed (positive or negative) depending on features, their impacts on writing scores could be offset, producing a negligible gender difference in writing test scores. Other studies could also combine DIF analysis with linguistic analysis to gain a better understanding of the factors that affect test performance.

Assessment of Sign Languages and Interpreting Competence

With the aim of measuring deaf children's literal and inferential understanding of passages, Rosenburg et al. developed an assessment tool called the American Sign Language Text Comprehension Task. They conducted a validation study administering the tool to deaf children of deaf parents and deaf children of hearing parents. Results showed that the internal consistency, discriminability, and difficulty of the instrument were acceptable. Scores correlated significantly with those of synonym and antonym tests. Deaf children of deaf parents scored better than deaf children of hearing parents, a pattern that was consistent with earlier findings. Taken together, these results provide positive evidence for the validity of the new assessment tool and suggest its utility as a measure of text comprehension skills in deaf children.

Language assessment research has typically focused on language outcomes, providing information about examinees' vocabulary knowledge, grammar skills, or speaking proficiency. Much less attention has traditionally been devoted to language input. When targeting language knowledge of deaf and hard-of-hearing (DHH) children, Hall argues that the assessment scope needs to be significantly broadened. His detailed conceptual analysis draws our attention to the manner in which DHH children address language input, which is truly diverse, and calls for developing measures reflecting the language input that DHH children received during infancy and toddlerhood. Hall outlines several features required of such measures. These include examining an aggregated picture of how a DHH child has interacted with language input over a precisely defined period and representing the extent to which a DHH child has had limited access to language input, finally yielding more informative profiles of language access. Such profiles, in turn, can help inform language assessment at both the individual and population levels. At the individual level, suitable language input measures could distinguish between DHH children's language delay and language disorder. At the population level, such measures could be useful in understanding how language relates to child development.

In another study, Wang et al. reported on the development of the Chinese Standards of English-Interpreting Competence Scales. This is a standardized, national framework of Chinese-English interpretation competence that can be used to train and assess interpreters in China. The project consisted of (i) the definition of interpretation competence, (ii) the relationship between the definition and task, (iii) the collection and analysis of descriptors, (iv) quantitative validation, and (v) qualitative validation. Initially, the authors collected or created 9,208 descriptors. Quantitative and qualitative analyses of data from surveys and interviews reduced and refined the initial pool of descriptors to 369 descriptors. The authors argued that descriptors could be used to create tasks and teaching materials for classroom use, as well as self-assessment.

Use of Advanced Quantitative Methods

Studies of the strength of the relationship between vocabulary size and vocabulary depth have yielded mixed findings. It is therefore not clear whether vocabulary knowledge is a single construct incorporating both size and depth or else two separate constructs of size and depth. To address this issue, Koizumi and In'nami analyzed vocabulary test data from 255 Japanese learners of English. Results of conventional and Bayesian structural equation modeling suggest that vocabulary size and depth are two closely correlated ( r = 0.946 and 0.943 for conventional and Bayesian analyses, respectively) but separate abilities. This suggests that a comprehensive measurement of vocabulary knowledge requires an assessment of both size and depth. The results can be reported as a composite score of vocabulary knowledge, or two separate scores of size and depth.

Within the framework of structural equation modeling, Dunn and McCray examined the role of the bifactor model, where a general factor and a specific factor explain an observed variable. This is important in language assessment, as the structure of a test relates to how the scores are reported. To demonstrate this, they analyzed data on the grammar and vocabulary sections of the British Council's Aptis test using a bifactor model, a correlated-factor model, and a unidimensional model. The bifactor model explained the data best, suggesting the possible reporting of either a composite score or skill-specific scores. However, the average size of factor loadings was similar across models, suggesting the sufficiency of simply reporting a composite score. The authors conclude by reporting a composite score, a practice consistent with the Aptis test.

Finally, yet importantly, we would like to thank the reviewers for their valuable comments and suggestions. Without their help, the publication of this special issue would not have been possible. We hope that the readers of the journals will find this collection of research papers useful.

Author Contributions

VA, TE, and YI contributed to conception and writing of the editorial. All authors contributed to the article and approved the submitted version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

  • Aryadoust V. (2019). An integrated cognitive theory of comprehension . Int. J. Listen. 33 , 71–100. 10.1080/10904018.2017.1397519 [ CrossRef ] [ Google Scholar ]
  • Aryadoust V., Raquel M. (eds.) (2020). Quantitative Data Analysis for Language Assessment Volume II: Advanced Methods. New York, NY: Routledge. 10.4324/9781315187808 [ CrossRef ] [ Google Scholar ]
  • Aryadoust V., Raquel M. (eds.) (2019). Quantitative Data Analysis for Language Assessment Volume I: Fundamental Techniques. New York, NY: Routledge. 10.4324/9781315187815 [ CrossRef ] [ Google Scholar ]
  • Carroll J. B. (1961). Fundamental considerations in testing for English language proficiency of foreign students , in Teaching English as a Second Language , ed Allen H. B. (McGraw Hill: ), 364–372. [ Google Scholar ]
  • Chapelle C. A. (2008). Utilizing technology in language assessment , in Encyclopedia of Language and Education, 2nd Edition, Volume 7: Language Testing and Assessment , eds Shohamy E., Hornberger N. H. (Boston, MA: Springer; ), 123–134. [ Google Scholar ]
  • Farhady H. (2018). History of language testing and assessment , in The TESOL Encyclopedia of English Language Teaching , ed Liontas J. I. (Hoboken, NJ: John Wiley & Sons; ). [ Google Scholar ]
  • Kotowicz J., Woll B., Herman R. (2021). Adaptation of the British sign language receptive skills test into polish sign language . Lang. Test. 38 , 132–153. 10.1177/0265532220924598 [ CrossRef ] [ Google Scholar ]
  • Lado R. (1961). Language Testing: The Construction and Use of Foreign Language Tests A Teacher's Book . Bristol: Inglaterra Longmans, Green and Company. [ Google Scholar ]
  • Spolsky B. (2017). History of language testing , in Language Testing and Assessment: Encyclopedia of Language and Education , eds Shohamy E., Or I. G., May S. (Springer; ), 375–384. [ Google Scholar ]
  • Weir C. (2005). Language Testing and Validation: An Evidence-Based Approach . Palgrave Macmillan. [ Google Scholar ]
  • Technical Support
  • Find My Rep

You are here

Language Testing

Language Testing

Preview this book.

  • Description
  • Aims and Scope
  • Editorial Board
  • Abstracting / Indexing
  • Submission Guidelines

Listen to the official Language Testing podcast called Language Testing Bytes for free at https://www.youtube.com/playlist?list=PL7cRo0Em6bxGd1Qt4BnzL6DmLLLk939PN

Language Testing is an international peer reviewed journal that publishes original research on foreign, second, additional, and bi-/multi-/trans-lingual (henceforth collectively called L2) language testing, assessment, and evaluation. Since 1984 it has featured high impact L2 testing papers covering theoretical issues, empirical studies, and reviews. The journal's scope encompasses the testing, assessment, and evaluation of spoken and signed languages being learned as L2s by children and adults, and the use of tests as research and evaluation tools that are used to provide information on the language knowledge and language performance abilities of L2 learners. Many articles also contribute to methodological innovation and the practical improvement of L2 testing internationally. In addition, the journal publishes submissions that deal with L2 testing policy issues, including the use of tests for making high-stakes decisions about L2 learners in fields as diverse as education, employment, and international mobility. The journal welcomes the submission of papers that deal with ethical and philosophical issues in L2 testing, as well as issues centering on L2 test design, validation, and technical matters. Also of concern is research into the washback and impact of L2 language test use, the consequences of testing on L2 learner groups, and ground-breaking uses of assessments for L2 learning. Additionally, the journal wishes to publish replication studies that help to embed and extend knowledge of generalisable findings in the field. Language Testing is committed to encouraging interdisciplinary research, and is keen to receive submissions which draw on current theory and methodology from different areas within second language acquisition, applied linguistics, educational measurement, psycholinguistics, general education, psychology, cognitive science, language policy, and other relevant subdisciplines that interface with language testing and assessment. Authors are encouraged to adhere to Open Science Initiatives.

Language Testing is an international peer reviewed journal that publishes original research on foreign, second, additional, and bi-/multi-/trans-lingual (henceforth collectively called L2) language testing, assessment, and evaluation. The journal's scope encompasses the testing of L2s being learned by children and adults, and the use of tests as research and evaluation tools that are used to provide information on the knowledge and performance abilities of L2 learners.

In addition, the journal publishes submissions that deal with L2 testing policy issues, including the use of tests for making high-stakes decisions about L2 learners in fields as diverse as education, employment, and international mobility. The journal welcomes the submission of papers that deal with ethical and philosophical issues in L2 testing, as well as issues centering on L2 test design, validation, and technical matters. Primary studies, replication studies, and secondary analyses of pre-existing data are welcome. Authors are encouraged to adhere to Open Science Initiatives.

  • Academic Search Premier
  • British Education Index
  • Contents Pages in Education
  • Current Index to Journals in Education
  • Educational Research Abstracts Online - e-Psyche
  • IBZ: International Bibliography of Periodical Literature
  • IBZ: International Bibliography of Periodical Literature in the Humanities and Social Sciences
  • Informationszentrum Für Fremdsprachenforschung (IFS)
  • Informationszentrum Für Fremdsprachenforschung (IFS)
  • International Bibliography of Book Reviews of Scholarly Literature in the Humanities and Social Sciences
  • International Bibliography of Book Reviews of Scholarly Literature on the Humanities and Social Sciences
  • Language Teaching
  • Linguistics Abstracts
  • Linguistics and Language Behavior Abstracts
  • MLA Abstracts of Articles in Scholarly Journals
  • MLA International Bibliography
  • Professional Development Collection
  • Psychology & Behavioral Sciences Collection
  • Social Science Abstracts
  • e-Psyche (Ceased)

Manuscript Submission Guidelines: Language Testing

This Journal is a member of the Committee on Publication Ethics

Please read the guidelines below then visit the Journal’s submission site http://mc.manuscriptcentral.com/LTJ to upload your manuscript. Please note that manuscripts not conforming to these guidelines may be returned.

Only manuscripts of sufficient quality that meet the aims and scope of Language Testing will be reviewed. Please note that this journal only publishes manuscripts in English.

There are no fees payable to submit or publish in this journal unless the author chooses the Sage Choice open access option (please see section 3.3 for further information).

As part of the submission process you will be required to warrant that you are submitting your original work, that you have the rights in the work, that you are submitting the work for first publication in the Journal and that it is not being considered for publication elsewhere and has not already been published elsewhere, and that you have obtained and can supply all necessary permissions for the reproduction of any copyright works not owned by you.

Please see our guidelines on prior publication and note that Language Testing may accept submissions of papers that have been posted on pre-print servers; please alert the Editorial Office when submitting (contact details are at the end of these guidelines) and include the DOI for the preprint in the designated field in the manuscript submission system. Authors should not post an updated version of their paper on the preprint server while it is being peer reviewed for possible publication in the journal. If the article is accepted for publication, the author may re-use their work according to the journal's author archiving policy.

If your paper is accepted, you must include a link on your preprint to the final version of your paper.

  • What do we publish? 1.1 Aims & Scope 1.2 Article types 1.3 Writing your paper
  • Editorial policies 2.1 Peer review policy 2.2 Authorship 2.3 Notes 2.4 Acknowledgements 2.5 Declaration of conflicting interests 2.6 Funding
  • Publishing policies 3.1 Publication ethics 3.2 Contributor's publishing agreement 3.3 Open access and author archiving 3.4 Guidance for authors with multiple institutional affiliations
  • Preparing your manuscript 4.1 Cover letter 4.2 Title page 4.3 Abstract 4.4 Formatting 4.5 Length 4.6 Statistical reporting 4.7 Anonymizing your manuscript 4.8 Artwork, figures and other graphics 4.9 Supplementary material 4.10 Video abstracts 4.11 Open Science Badges 4.12 English language editing services
  • Submitting your manuscript 5.1 ORCID 5.2 Information required for completing your submission 5.3 Permissions
  • On acceptance and publication 6.1 Sage Production 6.2 Online First publication 6.3 Access to your published article 6.4 Promoting your article
  • Further information

1. What do we publish?

1.1 Aims & Scope

Before submitting your manuscript to Language Testing , please ensure you have read the Aims & Scope .

1.2 Article Types

Language Testing accepts the following article types:

  • Original Manuscript [9,000 words]: Original articles focus on the testing and assessment of language for a range of purposes, whether educational or professional, in second or foreign language, bilingual, and/or multilingual situations. Equal preference is given to empirically based and theoretical articles.
  • Meta-analysis [12,000 words]: Articles that synthesize the results of multiple studies of a phenomenon into a single result. Reporting standards for meta-analyses can be found here .
  • Systematic Review [12,000 words]: Articles that generate evidence for clearly formulated questions using systematic methods for the identification, critical review, and analysis of data from primary research. Reporting guidance for systematic reviews can be found here .
  • Brief Report [4,000 words]: Brief reports provide a concise format for the reporting of technically significant research of interest to the language assessment community. More information about the content and format of brief reports is available at this link .
  • Registered Report [5,000 words]: Empirical articles in the form of research proposals  which are peer-reviewed prior to data collection. Authors may then carry out the study with in-principle acceptance for the final published manuscript. More information about the content, format, and workflow of registered reports is available at this link .
  • Book Review [1,500 words]: Reviews of books or edited volumes concerning language testing or other topics of interest to the language assessment community. Book reviews are commissioned by the book reviews editor.
  • Test Review [4,000 words]: Reviews of commercially or locally produced language tests. Test reviews are commissioned by the test reviews editor.
  • Viewpoint   [4,000 words]: Position papers on key topics from authors with invited rejoinders. Our aim is to choose timely topics and publish these through an expedited process, ensuring that the journal remains responsive to current issues and debates in the field. Viewpoint writers are encouraged to contact one of the Co-Editors to ensure content fit in advance of submission.
  • Letter to the Editor [1,500 words]: Commissioned rejoinders to viewpoint pieces to be published in dedicated sections of issues.
  • Obituary [750 words]

In addition, Language Testing periodically sends calls to welcome ideas and suggestions from potential guest editor for special issue proposals on topical themes. Calls will be sent out on listservs and Twitter .

1.3 Writing your paper

The Sage Author Gateway has some general advice and on  how to get published , plus links to further resources. Sage Author Services also offers authors a variety of ways to improve and enhance their article including English language editing, plagiarism detection, and video abstract and infographic preparation.

1.3.1 Submitting a manuscript based on a dissertation or thesis

Language Testing encourages authors to submit papers based on their dissertations or theses. Authors should submit a cover letter stating that their paper is based on a dissertation or thesis and provide the APA citation to the dissertation or thesis, and the paper should cite the original dissertation or thesis as well. More tips and information is available at this link .

1.3.2 Make your article discoverable

When writing up your paper, think about how you can make it discoverable. The title, keywords and abstract are key to ensuring readers find your article through search engines such as Google. For information and guidance on how best to title your article, write your abstract and select your keywords, have a look at this page on the Gateway:  How to Help Readers Find Your Article Online .

Back to top

2. Editorial policies

2.1 Peer review policy

Language Testing is a fully peer reviewed international journal that publishes original research and review articles on Language Testing and assessment. Peer review ensures the publication of only the highest quality articles through a fair and objective process. Together with the editors, the referees play a vitally important role in maintaining the exceptionally high standards of the journal.

Review Procedures

All manuscripts are reviewed initially by the editors and only those papers that meet the standards of the journal, and fit within its aims and scope, are sent out for peer review. Manuscripts approved for external review will normally be sent to three reviewers. All manuscripts are sent anonymously to ensure unbiased consideration by the referees. Submissions are normally reviewed within 2 months of submission, although due to the rigorous anonymize peer review system this sometimes takes longer. Authors should expect a decision on a submission within 3 months.

Please note that, due to a limited reviewer pool and with rare exceptions (e.g., special issue editors), the editorial team reserves the right not to process more than two unique manuscripts by the same author in any given year, regardless of authorship position.

Commissioned Papers

From time to time the editors may commission papers for Language Testing , normally for anniversary or special issues. Commissioned papers are sent for review by two or three external reviewers, and the reviews evaluated by the editors in the same way as for all other submissions. A commission therefore does not imply that the submission will be published.

Book and Test Reviews

Book and Test Reviews are commissioned by the book and test review editor respectively. Reviews represent the professional view of the expert in question, and publication is dependent upon review by the relevant editor. Book reviews are not normally subject to the multiple-anonymize peer review system that is operated for all other submissions. Test reviews are subject to the normal double anonymize peer review process.

Selection of Reviewers and Timelines

The editors of Language Testing select reviewers from the Editorial Board and the Language Testing community on the grounds of their expertise to judge the suitability for publication of the submission concerned. All reviewers are qualified and experienced academics with the highest possible reputation in their field, including, in many cases, a history of publishing in Language Testing .

Reviewer Guidelines

Reviewers are asked to judge the suitability of submissions on the following criteria:

  • Published articles, empirical or theoretical, must be original and must make a significant contribution to knowledge in the field of language testing.
  • An article should relate reported findings or proposed theoretical contribution to existing knowledge. This is generally to be accomplished through a competent and critical review of the relevant literature.
  • Research articles, whether quantitative or qualitative in approach, should be based on new data collected and analysed in a rigorous and well-designed investigation. Secondary analyses may be used to support theoretical contributions.

Decision Making

Reviewers make recommend that a submission be (a) rejected, (b) revised and resubmitted, (c) accepted for publication with minor amendments, or (d) accepted for publication forthwith. In the case of (c) the editors may ask one or more of the reviewers to ‘sign off’ on amendments, or undertake this task themselves. When manuscripts are revised and resubmitted the editors make every attempt to ask the original reviewers to consider the manuscript again and evaluate it against the specific recommendations made in the first review. If for any reason a reviewer declines to take part in a second review, the editors will attempt to find a replacement reviewer.

The final decision to publish or reject remains with the editors.

Conflict of Interest

If one of the editors, colleague or a student of an editor submits a manuscript to Language Testing , the co-editor steers the manuscript through the review process and keeps the names of the reviewers from the other. No editor takes any decisions or responsibility for the review process of their own work, or the work of a close colleague, student, or friend.

If a reviewer recognizes the author of a paper as a colleague, student, or friend, they refuse to take part in the review process.

Feedback to Reviewers

Under normal circumstances, anonymous copies of all reviews are circulated to the reviewers within one month of a decision being taken on a manuscript, together with an indication of the decision made. This maintains an open and transparent process, and helps newer reviewers to understand the review process.

Feedback to Authors

Authors are provided with a decision on their manuscript together with anonymous copies of the reviews, usually within two weeks of a decision being made. Where manuscripts are accepted for publication subject to amendments, a timeline for making the amendments is agreed.

2.2 Authorship

All parties who have made a substantive contribution to the article should be listed as authors. Principal authorship, authorship order, and other publication credits should be based on the relative scientific or professional contributions of the individuals involved, regardless of their status. A student is usually listed as principal author on any multiple-authored publication that substantially derives from the student’s dissertation or thesis.

Language Testing is trialling the application of CRediT author contribution statements to increase transparency in reporting the role that named authors on manuscripts have played in contributing to the study. The CRediT taxonomy is being applied in international journals in many fields to standardize reporting about each author's respective contribution to research and dissemination in disciplines where this practice is already commonplace. We are happy to be one of Sage's first social science/educational journals to participate in this scheme.

At submission stage for both single-authored and multi-authored works, the submitting author will need to select the roles that each named author on the manuscript played from a list of 14 standardised roles. Here is an example of what this could look like for an empirical study that generates new data. Note that the roles are alphabetized and do not necessarily follow the order that would be expected in the research process.

Author1: Formal analysis (i.e., data analysis), Investigation (includes data collection), Methodology, Project administration, Resources (e.g., instrument development), Writing—first draft

Author2: Conceptualization, Funding acquisition, Methodology, Supervision, Writing—review & editing

The CRediT statement should not replace the Acknowledgements section. People who are not named authors but contributed to research or dissemination in some way can still be acknowledged in the Acknowledgments section as before.

Please refer to the   CRediT Gateway pag e for more i nformation about the scheme. Please report any feedback or questions that you may have about CRediT to the Editorial Assistant in the first instance.

Please note that AI chatbots, for example ChatGPT, should not be listed as authors. For more information see the policy on Use of ChatGPT and generative AI tools .

Language Testing requires authors to manually enter Notes at the end of their manuscript as Notes (do not use Footnotes or Endnotes). Notes are optional, and should be economically used and as brief as possible. In text where you want to reference a Note, add in a superscript number. The Notes section should have the notes numerically listed. The Notes should appear at the end of the article, prior to the References.

2.4 Acknowledgements

All contributors who do not meet the criteria for authorship should be listed in an Acknowledgements section. Examples of those who might be acknowledged include a person who provided purely technical help, or a department chair who provided only general support.

During initial manuscript submission, any acknowledgements should appear at the bottom of the title page. After a manuscript is accepted for publication and authors are asked by the editor to un-anonymize the manuscript to prepare it for publication, authors should move the acknowledgements from the title page to the main manuscript: After un-anonymizing, the acknowledgments should appear at the end of the article after any notes and before the references.

2.4.1 Third party submissions

Where an individual who is not listed as an author submits a manuscript on behalf of the author(s), a statement must be included in the Acknowledgements section of the manuscript and in the accompanying cover letter. The statements must:

  • Disclose this type of editorial assistance – including the individual’s name, company and level of input
  • Identify any entities that paid for this assistance
  • Confirm that the listed authors have authorized the submission of their manuscript via third party and approved any statements or declarations, e.g. conflicting interests, funding, etc.

Where appropriate, Sage reserves the right to deny consideration to manuscripts submitted by a third party rather than by the authors themselves .

2.5 Declaration of conflicting interests

In the field of language testing, test providers and other organizations (e.g., government) often fund research on the development, validation, and use of assessments. This work may be done in-house or through colleagues external to the organization and may involve formal or informal agreements, knowledge exchange, and collaboration/partnership between different parties. In the interest of transparency in research conduct and dissemination and in line with practices in other fields, Language Testing mandatorily requires an explicit declaration of Conflicts of Interest for submitting authors for all article types on the manuscript submission system. Note that declaring a Conflict of Interest does not imply a lack of integrity. Conflicts of Interest are considered inevitable given the connections among scholarship, language assessment research, and test construction and validation. Conflicts of Interest are a recognition that authors have researched or intend to disseminate something in which they have (or could be perceived to have) a stake.  The purpose of a Conflicts of Interest statement is to allow readers to make up their own minds about any potential bias. Such transparency is to be encouraged. 

All listed authors on a given manuscript must disclose competing interests that are ongoing or have occurred within the past five years. Revealing Conflicts of Interest outside of this timespan (e.g., consultancy) is at the authors' discretion. Competing interests include relationships, affiliations, funding sources, and any other financial or non-financial interests that are in any way (or could be perceived to be) relevant to the content of the manuscript, even if indirectly. For example, employees at a given organization conducting research on their own test should explicitly divulge this in their conflict-of-interest statement, even if this is implied through their listed institutional affiliation(s). An author reviewing or critiquing a test should disclose all paid and unpaid (including advisory) positions over the past five years that relate in some way to that test or test provider, its market competitors, or an assessment context or domain that could have a stake in the way that the test is portrayed. For example, framing a test in a certain way could potentially affect uptake of a test at the author's institution.  

Some examples of financial Conflicts of Interest include but are not limited to: paid employment, grants, receiving payment for consultancy or advisory activities, ownership of stocks/shares, and planned or awarded patents. Some examples of non-financial conflicts of interest include but are not limited to: relationships with organizations (e.g., corporations, charities, etc.), membership of a government, board, or lobby/advocacy group, personal relationships that may affect objectivity, and personal beliefs or experiences that may affect objectivity if relevant to the article.   

If in doubt about a potential Conflict of Interest, it is always better to declare than withhold it. In addition to writing a Conflict of Interest statement on the manuscript submission system, please flag any potential Conflicts of Interest to the Editors in your cover letter (see Section 4.1 for information about cover letters). Please review the good practice guidelines on the  SAGE Journal Author Gateway  for further information.  

Language Testing Editors’ Conflict of Interest declarations, 2022-23

2.6 Funding

Language Testing requires all authors to acknowledge their funding in a consistent fashion under a separate heading. In the online submission system, you will be asked, “Is there funding to report for this submission?” and you must click “Yes” or “No.” If you select “Yes,” you will be prompted to enter information on the funder (name of the funder; grant/award number). You may report multiple funders. Please visit the Funding Acknowledgements  page on the Sage Journal Author Gateway to confirm the format of the acknowledgment text in the event of funding, or state that: This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. 

3. Publishing Policies

3.1 Publication ethics

Sage is committed to upholding the integrity of the academic record. We encourage authors to refer to the Committee on Publication Ethics’ International Standards for Authors  and view the Publication Ethics page on the  Sage Author Gateway .

3.1.1 Plagiarism

Language Testing and Sage take issues of copyright infringement, plagiarism or other breaches of best practice in publication very seriously. We seek to protect the rights of our authors and we always investigate claims of plagiarism or misuse of published articles. Equally, we seek to protect the reputation of the journal against malpractice. Submitted articles may be checked with duplication-checking software. Where an article, for example, is found to have plagiarised other work or included third-party copyright material without permission or with insufficient acknowledgement, or where the authorship of the article is contested, we reserve the right to take action including, but not limited to: publishing an erratum or corrigendum (correction); retracting the article; taking up the matter with the head of department or dean of the author's institution and/or relevant academic bodies or societies; or taking appropriate legal action.

3.1.2 Prior publication

If material has been previously published it is not generally acceptable for publication in a Sage journal. However, there are certain circumstances where previously published material can be considered for publication. Please refer to the guidance on the Sage Author Gateway  or if in doubt, contact the Editor at the address given below.

3.2 Contributor's publishing agreement

Before publication, Sage requires the author as the rights holder to sign a Journal Contributor’s Publishing Agreement. Sage’s Journal Contributor’s Publishing Agreement is an exclusive licence agreement which means that the author retains copyright in the work but grants Sage the sole and exclusive right and licence to publish for the full legal term of copyright. Exceptions may exist where an assignment of copyright is required or preferred by a proprietor other than Sage. In this case copyright in the work will be assigned from the author to the society. For more information please visit the  Sage Author Gateway .

3.3 Open access and author archiving

Language Testing offers optional open access publishing via the Sage Choice programme and Open Access agreements, where authors can publish open access either discounted or free of charge depending on the agreement with Sage. Find out if your institution is participating by visiting Open Access Agreements at Sage . For more information on Open Access publishing options at Sage please visit Sage Open Access . For information on funding body compliance, and depositing your article in repositories, please visit Sage’s Author Archiving and Re-Use Guidelines and Publishing Policies .

3.4 Guidance for authors with multiple institutional affiliations

Some authors of submissions to Language Testing have multiple institutional affiliations. In cases where authors are salaried part-time or full-time employees of a non-academic institution, authors submitting to Language Testing should declare this as a primary or secondary institutional affiliation. This excludes contract-based work conducted as part of a consultancy agreement or advisory work.

A common scenario is that an author of an article to be submitted to Language Testing is employed by an assessment organization as an employee. The assessment organization may be a company or a not-for-profit organization or charity and may be in the public or private sector. The author also holds an affiliation with an academic institution as a regular or honorary member. Authors in such a position should list affiliation with the assessment organization in which they are employed as either a primary or secondary affiliation. If the author also chooses to list their affiliation with the academic institution, the order in which they would like institutional affiliations to appear is at their discretion, pending agreement by their affiliated institutions. In some cases, this choice may depend on requirements for accessing institutional agreements for open access funding. In summary, authors employed by an assessment organization should not withhold listing this as their primary or secondary affiliation. Authors employed at schools or other types of institutions or who are self-employed and have a second affiliation should follow these same principles. In the event of authors wanting to withhold institutional information for reasons of personal safety, to minimize reputational damage, or for any other serious reason, this should be explicitly stated in the covering letter to the Editors that accompanies the first (initial) submission. However, in the vast majority of scenarios, the expectation is that places of formal employment will be declared as detailed above.

Note that if a given author is affiliated with multiple academic institutions and is not formally employed by any non-academic institution, it is at the author's discretion as to which and how many academic affiliations they list. Any looser affiliations (e.g., contract-based consultancies) should not be listed as an institutional affiliation. Instead, they can be declared as a conflict-of-interest in the relevant section.

4. Preparing your manuscript for submission

4.1 Cover letter

Cover letters addressed to the Editors are now a requirement for all articles submitted to Language Testing . Please see the 7th edition of the Publication Manual of the American Psychological Association (APA), Section 12.11 (“Writing a Cover Letter”) to complement the points below:.

  • For the initial submission of your manuscript, please list the full title and institutional affiliation(s) of all named authors (see Section 3.4 on guidance for authors with multiple affiliations).
  • If the manuscript has been published as a pre-print, please acknowledge this in the cover letter, providing the full reference, including the DOI.
  • In the case of primary and secondary research studies, please confirm that the study has received ethics approval from the relevant Ethics Committee or Institutional Review Board, specifying which organization granted the approval. If no ethics approval was obtained, this must be indicated and justified.
  • Please bring any other open science practices to the Editor’s attention if relevant (see Section 4.11, “Open Science Badges.” for examples of possible initiatives).
  • If the manuscript presents work that is part of larger project (e.g., research grant), please explicitly state this, clarifying which aspects of the project are and are not covered in the manuscript. Please provide a link to any project websites if available. Please see dedicated guidance about submitting manuscripts based on unpublished dissertations/theses in Section 1.3.1.
  • You must disclose if any manuscripts based on a related study are under consideration for publication concurrently or have been published previously. In the case of previous related publications, please provide full references and DOI or URL together with a clear account of how the project discussed in manuscript under consideration in Language Testing relates to those other outputs and how and why it is novel. If any previous or concurrent publications draw on the same dataset, please declare this, stating how the data have been analyzed and presented differently to address distinct research questions.If any previous or concurrent publications use very similar methods and apply them to a different dataset, please provide details of these outputs. If any previous or concurrent publications use very similar methods and apply them to a different dataset, please provide details of these outputs. 
  • Please bring to the Editor's attention any Conflicts of Interest in your cover letter, which you will also need to disclose in the relevant section on the manuscript submission system (see Conflict of Interest subheading   under Section 2.1, “Peer Review Policy”).
  • Authors are welcome to suggest potential reviewers in their cover letter. However, it is at the Editor's discretion which reviewers they invite.
  • In cases of revised submissions of manuscripts, cover letters need not repeat all the information that was in the cover letter for the original submission. If there is nothing new to bring to the Editor’s attention in light of previously disclosed information, cover letters for revised submissions can simply direct the Editors to the Response letter to the reviewers’ comments.

4.2 Title Page

Manuscripts must include a title page containing all author details, ORCID ID, and affiliations, as well as any acknowledgements. Author affiliations should be indicated as the institution where the research was carried out, and that address must be retained as the main affiliation address.

4.3 Abstract

Each manuscript should include an abstract which summarizes the study in no more than 200 words. The abstract should be entered separately in the online submission system. Do not include the abstract in the main manuscript. The abstract does not count towards the manuscript length restrictions.

Translations of abstracts may be published if the authors desire. Translations should be written by the authors or commissioned by the authors themselves. Translated abstracts may be included on the title page. Please provide the translation and indicate the language(s) of translation.

4.4 Formatting

Manuscripts must be submitted as either Microsoft Word or LaTeX files. Please format your manuscript following the guidelines of the 7th edition of the Publication Manual of the American Psychological Association (APA). Please ensure that URLs for websites are not inserted into the body of the manuscript. Instead, please provide a full citation in the reference list and cite appropriately in text. Avoid unnecessary formatting and use a standard font such as Times New Roman or Arial. The text of the manuscript must be left-aligned, double-spaced, with the first line of each paragraph indented. The Notes and reference list should be double-spaced as well.

References should follow APA 7 th edition reference style. Examples are available here , but authors should consult the full handbook for specific guidance on style. Please include Digital Object Identifiers (DOIs) in the format https://doi.org/xxxxxx . When there is no DOI, add in a stable URL if available.

Manuscripts must adhere to length restrictions as outlined in section 1.2. Length includes notes, figures or tables, and references, but not including the abstract.

4.6 Statistical Reporting

Where possible and appropriate, authors should supply sufficient information, including test texts and items, to enable replication. Lack of statistically significant results, or difficulty in drawing clear conclusions, will not necessarily rule out publication of interesting contributions. Empirical papers that use significance testing should as a matter of course provide effect sizes and confidence intervals.

4.7 Anonymizing your manuscript

Language Testing uses a double-anonymize peer review system, and it is therefore important that manuscripts are free of any information that might identify any author. To anonymize your manuscript, please check the following:

  • Acknowledgements and (grant/award) funding information should not be included in the main document. The Acknowledgements should be added to the title page.
  • Self-citations and external reference to your own research should be minimized. If you wish to refer to methodological tools or instruments you developed which are published in another study, please upload these as supplementary material for peer-review only. At the end of the peer-review process you may remove these supplementary materials and add in citations to your own published work.
  • If it is necessary to explicitly refer to your previously published work, please ensure that it is anonymized; that is, made completely anonymous with no identifiable information included. For the reference section and parenthetical citations, refer to your work as "Author. (Year)." For multiple references, you may use "Year 1," "Year 2," etc. Please ensure that these anonymized references are in alphabetical order in the reference section; in other words, "Author" should be in the "A" section or alternatively at the top of the reference list. Do not place your reference in the order your name would usually appear in. Please ensure that your and your co-author's name(s), year of publication, title of publication, and all other accompanying information are removed or anonymized. For in-text citations, e.g., "Jones (2016) argued …," you may change these to "Author (Year) argued." References should similarly be "Author(s) (Year)" or "Author and Co-Author (Year)."
  • Any other information that could reveal the authors or their participants should be anonymized. For example, instead of writing, "The data were collected at Georgetown University," for the purposes of a anonymize
  • review, the authors should write, "The data were collected at X University" or "The data were collected at a large/national university in [Country name]."

4.8 Artwork, figures, and other graphics

For guidance on the preparation of the manuscript, including the formatting of figures and tables, please use the guidelines in the Publication Manual of the American Psychological Association (APA). However, be aware that in the online submission form, figures and tables originally made outside of Microsoft (MS) Word should be uploaded to the system as separate files (as described below). Figures or tables made in MS Word should appear at the end of manuscript, following the Publication Manual of the American Psychological Association (APA) guidelines:

  • Add a placeholder note in the running text (i.e., “[Insert Figure 1]”) indicating where, approximately, the figure or table should appear after typesetting.
  • Enter captions to be displayed with the Figure or Table (i.e., the title or a note) in the title of the figure or table.

Figures or Tables created outside of MS Word (i.e., TIFF, JPEG, JPG, EPS, PDF, Excel, PowerPoint) should be submitted separately under “File Upload” in the online submission system, one file for each Figure or Table:

  • Add a placeholder note in the running text (i.e., “[Insert Figure 1]”) indicating where, approximately, the Figure or Table should appear after typesetting.
  • For each figure or table that you separately upload, you should enter the caption or legend (text displayed with the image; usually a brief description) into the “Caption/Legend” textbox that appears in the online submission system.
  • Under “Link text” in the online submission system, type in the name of the file as you wrote it in the running text (i.e., “Figure 1”) so that when this text is found in your document, it will link to the selected file.

Please follow the guides (a) through (e) below on how to format your figure or table created outside of MS Word.

a. Format : TIFF or JPEG are preferred. These are the common format for pictures (or figures) with no text or graphs. EPS is the preferred format for graphs and line art (retains quality when enlarging/zooming in).

b. Resolution : Rasterized based files (i.e., .tiff or .jpeg extension) require a resolution of at least 300 dpi (dots per inch). Line art should be supplied with a minimum resolution of 800 dpi.

c. Colour : Figures supplied in colour will appear in colour online regardless of whether these illustrations are reproduced in colour in the printed version. For specifically requested colour reproduction in print, you will receive information regarding the costs from Sage after receipt of your accepted article.

d. Dimension : Check that the artworks supplied match or exceed the dimensions of the journal. Images cannot be scaled up after origination.

e. Fonts : The lettering used in the artwork should not vary too much in size and type (usually Arial as a default).

4.9 Supplementary material

Language Testing is able to host additional materials online (e.g. datasets, videos, images, etc.) alongside the full-text of the article. We strongly encourage authors to archive their datasets (and analysis models where appropriate, e.g., R Code) in an open repository such as the Open Science Framework ( https://osf.io ). Any datafiles submitted for publication as supplemental material will be hosted as open data in Figshare by Sage automatically. For more information, please refer to our guidelines on submitting supplementary files .

4.10 Video abstracts

Language Testing allows authors to have a link to a Video Abstract in their manuscript.  Sage provides guidelines on how to make a Video Abstract . Within your Video Abstract, encourage viewers to download your article. Invite viewers to ask questions via your Twitter or Facebook page (maybe suggest a hashtag). For more ideas, search the internet for tips on Video Abstract creation, or visit these sites:

  • The Scientist Videographer
  • Video abstracts in journal articles (IOPscience)
  • How to Turn Your Research Findings into a Video that People Actually Want to Watch
  • Video Abstracts are a Low-Barrier Means for Publishers to Extend the Shelf Life of Research

Video abstracts will be hosted on the Language Testing YouTube page , and will be linked to a collection on the journal’s website

4.11 Open Science Badges

Articles accepted to Language Testing are eligible to earn up to three badges that recognize open scientific practices: research preregistration, publicly available data, and publicly available materials. If you wish to apply for the Preregistered, Open Data, or Open Materials badges, please take the following steps: 1) mention this intention in your cover letter, 2) select the badges applicable to you in the submission panel under "Open Science Badges", 3) complete the " Language Testing Open Science Badges Disclosure Form ", and 4) include the disclosure form with your submission (upload your completed disclosure form within the online submission system under the "File Upload" section and select "OSF Disclosure Form" as the file type). To qualify for a preregistration, Open Data or Open Materials badge, you must provide a URL, DOI, or other permanent path for accessing the specified information in a public, open-access repository; it should be time-stamped and immutable. Qualifying public, open-access repositories are committed to preserving data, materials, and/or registered analysis plans and keeping them publicly accessible via the web in perpetuity. Examples include the Open Science Framework ( OSF ) and the various Dataverse networks. Hundreds of other qualifying data/materials repositories are listed at http://re3data.org/ . Personal websites and most departmental websites do not qualify as repositories. For more information about the badges and how to earn them, please see the OSF Wiki .

4.12 English language editing services

Authors seeking assistance with English language editing, translation, or figure and manuscript formatting to fit the journal’s specifications should consider using Sage Language Services. Visit Sage Language Services on our Journal Author Gateway for further information.

5. Submitting your manuscript

Language Testing is hosted on Sage Track, a web based online submission and peer review system powered by ScholarOne™ Manuscripts. Visit http://mc.manuscriptcentral.com/LTJ to login and submit your article online.

IMPORTANT: Please check whether you already have an account in the system before trying to create a new one. If you have reviewed or authored for the journal in the past year it is likely that you will have had an account created.  For further guidance on submitting your manuscript online please visit ScholarOne Online Help .

As part of our commitment to ensuring an ethical, transparent and fair peer review process Sage is a supporting member of ORCID , the Open Researcher and Contributor ID. ORCID provides a persistent digital identifier that distinguishes researchers from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between researchers and their professional activities ensuring that their work is recognised.

The collection of ORCID IDs from corresponding authors is now part of the submission process of this journal. If you already have an ORCID ID you will be asked to associate that to your submission during the online submission process. We also strongly encourage all co-authors to link their ORCID ID to their accounts in our online peer review platforms. It takes seconds to do: click the link when prompted, sign into your ORCID account and our systems are automatically updated. Your ORCID ID will become part of your accepted publication’s metadata, making your work attributable to you and only you. Your ORCID ID is published with your article so that fellow researchers reading your work can link to your ORCID profile and from there link to your other publications.

If you do not already have an ORCID ID please follow this link to create one or visit our ORCID homepage to learn more.

5.2 Information required for completing your submission

You will be asked to provide contact details and academic affiliations for all co-authors via the submission system and identify who is to be the corresponding author. These details must match what appears on your manuscript. At this stage please ensure you have included all the required statements and declarations and uploaded any additional supplementary files (including reporting guidelines where relevant).

5.3 Permissions

Please also ensure that you have obtained any necessary permission from copyright holders for reproducing any illustrations, tables, figures or lengthy quotations previously published elsewhere. For further information including guidance on fair dealing for criticism and review, please see the Copyright and Permissions page on the  Sage Author Gateway .

6. On acceptance and publication

6.1 Sage Production

Your Sage Production Editor will keep you informed as to your article’s progress throughout the production process. Proofs will be sent by PDF to the corresponding author and should be returned promptly.  Authors are reminded to check their proofs carefully to confirm that all author information, including names, affiliations, sequence and contact details are correct, and that Funding and Conflict of Interest statements, if any, are accurate.

6.2 Online First publication

Online First allows final articles (completed and approved articles awaiting assignment to a future issue) to be published online prior to their inclusion in a journal issue, which significantly reduces the lead time between submission and publication. Visit the Sage Journals help page  for more details, including how to cite Online First articles.

6.3 Access to your published article

Sage provides authors with online access to their final article.

6.4 Promoting your article

Publication is not the end of the process! You can help disseminate your paper and ensure it is as widely read and cited as possible. The Sage Author Gateway has numerous resources to help you promote your work. Visit the Promote Your Article  page on the Gateway for tips and advice. 

7. Further information

Any correspondence, queries, or additional requests for information on the manuscript submission process should be sent to the Language Testing editorial office as follows:

Sofya Styrina, Editorial Assistant Department of Linguistics University of Illinois at Urbana-Champaign USA E-mail: [email protected]

Talia Isaacs, Editor IOE, UCL's Faculty of Education and Society University College London London, United Kingdom Email: [email protected]

Xun Yan, Editor Department of Linguistics University of Illinois at Urbana-Champaign USA E-mail: [email protected]

Ruslan Suvorov, Associate Editor Faculty of Education Western University Canada E-mail: [email protected]

Benjamin Kremmel, Book Review Editor Language Testing Research Group Innsbruck (LTRGI) Department for Subject-Specific Education University of Innsbruck Austria E-mail: [email protected]

Ute Knoch, Test Review Editor Language Testing Research Centre School of Languages and Linguistics University of Melbourne Australia E-mail: [email protected]

  • Read Online
  • Sample Issues
  • Current Issue
  • Email Alert
  • Permissions
  • Foreign rights
  • Reprints and sponsorship
  • Advertising

Member Subscription, Combined (Print & E-access)

Individual Subscription, Combined (Print & E-access)

Institutional Subscription, E-access

Institutional Subscription & Backfile Lease, E-access Plus Backfile (All Online Content)

Institutional Subscription, Print Only

Institutional Subscription, Combined (Print & E-access)

Institutional Backfile Purchase, E-access (Content through 1998)

Institutional Subscription & Backfile Lease, Combined Plus Backfile (Current Volume Print & All Online Content)

Individual, Single Print Issue

Institutional, Single Print Issue

To order single issues of this journal, please contact SAGE Customer Services at 1-800-818-7243 / 1-805-583-9774 with details of the volume and issue you would like to purchase.

  • Open access
  • Published: 18 May 2022

Reliability of measuring constructs in applied linguistics research: a comparative study of domestic and international graduate theses

  • Kioumars Razavipour   ORCID: orcid.org/0000-0002-6533-2968 1 &
  • Behnaz Raji 1  

Language Testing in Asia volume  12 , Article number:  16 ( 2022 ) Cite this article

2462 Accesses

1 Citations

1 Altmetric

Metrics details

The credibility of conclusions arrived at in quantitative research depends, to a large extent, on the quality of data collection instruments used to quantify language and non-language constructs. Despite this, research into data collection instruments used in Applied Linguistics and particularly in the thesis genre remains limited. This study examined the reported reliability of 211 quantitative instruments used in two samples of domestic and international theses in Applied Linguistics. The following qualities in measuring instruments were used to code the data: the instrument origin, instrument reliability, reliability facets examined, reliability computation procedures utilized, and the source of reliability reported (i.e., primary or cited). It was found that information about instrument origin was provided in the majority of cases. However, for 93 instruments, no reliability index was reported and this held true for the measurement of both language and non-language constructs. Further, the most frequently examined facet of reliability was internal consistency estimated via Cronbach’s alpha. In most cases, primary reliability for the actual data was reported. Finally, reliability was more frequently reported in the domestic corpus than in the international corpus. Findings are discussed in light of discursive and sociomaterial considerations and a few implications are suggested.

Introduction

In educational measurement literature and in language testing, confidence in measurements depends on their consistency and validity. For an instrument to be valid, it has to be consistent (though the term consistency is more precise compared to reliability, in this paper, we used them interchangeably). That said, whereas in educational measurement and in language testing, much attention has been paid to investigating reliability and validity of tests used for selection and achievement purposes, the quality of measuring instruments used for research purposes in Applied Linguistics and language teaching remains underexplored. Such studies are warranted on the grounds that they carry immediate implications for practitioners, policy makers, and researchers. For the practitioners who rely on research findings to improve their language teaching practices, it is imperative that such research is based on sound measurements of constructs. Additionally, in action research, the effectiveness of educational interventions can only be examined through sound measurements of key variables. Sound measurements are also crucial for education policy makers who rely on research findings to choose, adapt, and implement language education policies. If the research informing policies is founded on inconsistent measurements, they are likely to derail proper policy making with grave consequences for language teachers, learners, and the wider society. Finally, proper measurements are of utmost importance for the progress of research and the production of knowledge in the field of Applied Linguistics and language teaching. Threats to the consistency and validity of measurements in research would potentially derail future research that depends on incremental accumulation of research evidence and findings. Given the mutual exchange of ideas and insights between Applied Linguistics and language testing (see Bachman & Cohen, 1998  and Winke & Brunfaut, 2021 ), the quality of research in different areas of AL influence research directions and decisions in language testing.

Despite the noted implications that reliable assessments hold for policy and practice, whether and the extent to which Applied Linguistics researchers examine or maximize the consistency of their measuring instruments remains underexplored. More specifically, the current literature on research instrument quality in AL is mostly focused on the published research papers. Indeed, we are aware of no published work on the reliability of measuring instruments in theses or dissertations in AL. We believe that as a distinct genre which operates under different sociomaterial circumstances and is written for a different audience, the thesis genre warrants closer scrutiny in terms of measurement quality because of the consequences and implications that the quality of this genre has for the academia and the wider society. This study intends to narrow the noted gap by investigating the reliability with which variables are measured in a corpus of theses and dissertations in Applied Linguistics across several academic settings. In the remaining of this paper, we first examine research quality in quantitative research in Applied Linguistics. We then zero in on issues of instrument validity and reliability within current theories of validity, particularly those of Messick and Kane.

Research quality and measurement

The fact that a good deal of Applied Linguistics research depends on the production and collection of quantitative data makes the quality of measuring instruments of crucial importance (Loewen & Gass, 2009 ). Unreliable data generates misleading statistical analyses, which, in turn, weakens or defeats the entire argument of quantitative and mixed methods studies. Subsequently, the quality of measuring instruments affects the internal validity of research studies (Plonsky & Derrick, 2016 ), which in turn compromises the credibility of research findings.

In the social sciences and Applied Linguistics, concern with reliability and validity of measuring instruments is a perennial problem that can “neither be avoided nor resolved” (Lather, 1993 , p. 674) because unlike metric systems in physics, which are almost of universal value and credibility, measuring instruments in AL do not satisfy the principle of measurement invariance (Markus & Borsboom, 2013 ). That is, the properties of measuring instruments are dependent upon the properties of the object of measurement (i.e., research participants, context of use, etc.). Hence, every time, a test or a questionnaire is used in a research study, its reliability and validity should be examined.

Given the centrality of measurement invariance, Douglas ( 2014 ) uses the “rubber ruler” metaphor to refer to this property of measuring instruments in AL research. As a rubber ruler may stretch or shrink depending on temperature, the interval between units of measurement fluctuate with changes in temperature. Therefore, the quality of measuring instruments (MIs) in AL research is often subject to contextual fluctuations. For this reason, examining and maximizing the reliability of measuring instruments is crucial. The following quote from Kerlinger (1986 cited in Thompson, 1988 ) captures the significant of instrument reliability in quantitative research.

Since unreliable measurement is measurement overloaded with error, the determination of relations becomes a difficult and tenuous business. Is an obtained coefficient of determination between two variables low because one or both measures are unreliable? Is an analysis of variance F ratio not significant because the hypothesized relation does not exist or because the measure of the dependent variable is unreliable? ...High reliability is no guarantee of good scientific results but there can be no good scientific results without reliability. (p. 415)

The above quote goes back to almost half a century ago, yet problems with MIs continue to persist in Applied Linguistics and SLA (Purpura et al., 2015 ).

In language teaching research, concern with how researchers handle quantitative data has recently increased. As such, several studies have addressed the quality of quantitative analyses (Khany & Tazik, 2019 ; Lindstromberg, 2016 ; Plonsky et al., 2015 ), researchers’ statistical literacy (Gonulal, 2019 ; Gonulal et al., 2017 ), and quality of instrument reporting (Derrick, 2016 ; Douglas, 2001 ; Plonsky & Derrick, 2016 ). Douglas ( 2001 ) states that researchers in SLA often do not examine indexes of performance consistency for the MIs they use.

Recently, inquiry into the quality of research studies has spurred interest in the evaluation of MIs, in particular their reliability and performance consistency (Derrick, 2016 ; Plonsky & Derrick, 2016 ) in published research articles. A common theme in both of the noted studies is that the current practices in measuring instruments’ reliability reporting are less than satisfactory. That is, inadequate attention is often given to the reliability of MIs in Applied Linguistics research. The current slim literature on research instrument quality is largely about the research article (RA) genre in. As such, we are aware of no published research on how the reliability of quantitative instruments is handled and reported in the thesis genre in Applied Linguistics research and almost exclusively the academic north of the globe (Ryen & Gobo, 2011 ). Given the culture and context-bound nature of research methogology and hence assessment methods (Chen, 2016 ; Ryen & Gobo, 2011 ; Stone & Zumbo, 2016 ), studying MIs in other contexts is warranted. In addition, theses are not subject to the same space limitations that the research paper is; thus, one would expect detailed accounts of data elicitation instruments in a thesis. For the noted reasons, this study examines the quailty of data elicitation instruments in a sample of theses in Applied Linguistics. We hope that findings would encourage graduate students and early career researchers to exercise more care and seek more rigor in their choice of MIs and the inferences they make of them, which would enhance the credibilty of research findings. In the remaining of this paper, we will first briefly discuss validity in Applied Linguistics and language testing. We do so to situate issues of reliability and consistency in the broader context of validity, which is the ultimate criterion of data and inference quality. We will then present our own study along with a discussion of findings and implications it might carry for research in Applied Linguistics.

Quality of measurements: validity and reliability

In psychometrics and educational measurement as well as in Applied Linguistics research, quality of measuring instruments is often captured by the term validity. In more traditional yet still quite common definitions, validity refers to the extent to which a measuring instrument measures what it is purported to measure and reliability is about how consistently it does so (Kruglanski, 2013 ). From this perspective, reliability is considered a necessary but insufficient precondition for validity, that is, an instrument can be reliable without being valid (Grabowski & Oh, 2018 ), which implies that an instrument may demonstrate consistency in the kind of data it yields without essentially tapping what it is purported to tap. In recent conceptualizations of validity, however, reliability is integrated within the domain of validity (Kane, 2006 ; Newton & Shaw, 2014 ; Purpura et al., 2015 ; Weir, 2005 ). Largely thanks to Messick’s legacy, validity is defined as an overall evaluative judgment of the degree to which empirical evidence and theoretical rationale justifies the inferences an actions that are made based on test scores (Messick, 1989 ). Viewed from this holistic approach to validity, reliability is considered one source of validity evidence that should be used to support the inferences that are to be made of test scores. Whereas this conceptualization of validity as argument is increasingly being embraced in educational measurement and language testing, it has yet to permeate the broad literature on Applied Linguistics research in general and TEFL in specific (Purpura et al., 2015 ). In fact, some scholars believe that lack of knowledge about how to effectively measure L2 proficiency is the main reason for the failure of the field of SLA to make real progress in explaining development and growth in an L2 (Ellis, 2005, cited in Chapelle, 2021 ).

While we are mindful of the importance of validity, in this paper, we focus exclusively on reliability for two reasons. First, we believe that despite the theoretical unification of aspects of validity evidence (Bachman & Palmer, 2010 ; Chapelle, 2021 ; Kane, 2013 ), reliability still serves as a good heuristic to examine measurement quality. This is evident even in Kane’s argument-based validity. In going from data to claims, the first argument that must be supported in argument-based validation is evaluation, which refers to how verbal or non-verbal data elicited via a quantitative measure is converted to a quantity and unless this argument is adequately supported, the rest of the validity chain cannot be sustained. Secondly, despite the noted theoretical shift, scholars continue to make the distinction between validity and reliability, perhaps because for the practitioners both Messick’s unified approach and Kane’s argument based validation are difficult to translate into the practice of evaluating their measuring instruments. For the noted reasons, we thought that imposing a theoretical framework of validity that is incompatible with current practices may not be helpful.

Reliability of data collected via quantitative data collection instruments

Concern with the quality of measurements in Applied Linguistics research is not new. More than two decades ago, Bachman and Cohen edited a book volume on how insights from SLA and Language Testing can assist in improving the measurement practices in the two fields. More recently, several studies have investigated reliability and consistency of quantitative instruments across disciplines (Plonsky, 2013 ; Plonsky & Derrick, 2016 ; Vacha-Haase et al., 1999 ). Al-Hoorie and Vitta ( 2019 ) investigated the psychometric issues of validity and reliability, inferential testing, and assumption checking in 150 papers sampled from 30 Applied Linguistics journal. Concerning reliability, they found that “almost one in every four articles would have a reliability issue” (p. 8).

Taken together, the common theme in most studies is that the current treatment of quantitative measures and instruments is far from ideal (Larson-Hall & Plonsky, 2015 ). That said, the findings of past studies are mixed, ranging from six percent of studies reporting reliability to 64% (Plonsky & Derrick, 2016 ). This loose treatment of quantitative data collection tools seems to be common in other social science disciplines such as psychology (Meier & Davis, 1990 ; Vacha-Haase et al., 1999 ).

Compared to research articles, much less work has been done on how the quality of MIs is addressed in other research genres such as theses and dissertations. Evaluating the research methodology of dissertations and published papers, Thompson ( 1988 ) identified seven methodological errors, one of which was the use of instruments with inadequate psychometric integrity. Likewise, Wilder and Sudweeks ( 2003 ) examined 106 dissertations that had used Behavioral Assessment System for Children and found that only nine studies did report reliability for the subpopulation they had studied and the majority of the studies only cited reliability from the test manual. Such practices in treating reliability likely arise from the misconception that reliability or consistency is an attribute of a measurement tool. However, given that reliability, in its basic definition, is the proportion of observed score variance in the data to the true variance, it follows that observed variance depends on the data collection occasion, context, and participants; change the context of use, and both observed variance and true variance change. That said, perhaps because of discursive habits, reliability is often invoked as an instrument property not the property of the data that is gathered via the instrument.

In sum, the above brief review points to a gap in research into the reliability of MIs in Applied Linguistics research. The current study intends to narrow this gap in the literature in the hope that it will raise further awareness of the detriments of poor research instruments. Our review of the literature showed that writers of RAs sometimes fail to provide full details regarding their MIs (Derrick, 2016 ), a practice which has repercussions for future research. Given the differences between the RA and thesis genre noted above, it is important to see how the quality of measuring instruments quality is addressed in the theses. The literature also suggests that reliability is underreported in RAs. In addition to addressing this in the thesis genre, in this study, we also delve further into the facets of reliability that are given attention. Given that in the discourse around reliability in Applied Linguistics, reliability is often attributed to the instrument not to the data, we further inquire into the extent to which this discourse affects the way researchers report the reliability of their data or choose to rely on reliability evidence reported in the literature. In addition, to our knowledge, extant literature has not touched upon possible relationship between reliability reporting behavior and the nature of constructs measured, a further issue we address in this study. Finally, given the situated nature of knowledge and research, it is important to know how the quality of quantitative research instruments is treated across contexts. The above objectives are translated into the following research questions.

How frequently are the origins of research instruments reported?

How frequently is the reliability reported? And when it is reported, what reliability facets are addressed and what estimation procedures are used for computing it?

What is the source of reliability (i.e., primary, cited, or both) that is reported?

Does the reliability reporting practices differ across construct types measured (language vs. non-language constructs) and across geographical regions?

We believe that these questions are important because the insights gained can contribute to our collective assessment literacy (Harding & Kremmel, 2021 ), which “has the capacity to reverse the deterioration of confidence in academic standards” (Medland, 2019 , p. 565), for research that relies on instruments of suspicious consistency add noise to the body of scholarship and can mislead and misinform future research.

To answer the research questions, a corpus of 100 theses and dissertations from 40 universities in 16 countries across the world was collected. Roughly half of the theses were chosen from Iran, and the other half were selected from 39 universities based mostly in American and European countries. The theses from universities in the USA had the highest frequency (15) followed by those in the Netherlands (6), Canada (5), and England (4). Given that at the time of data collection, we knew of no comprehensive repository of theses accommodating theses from all universities across the globe, a random sample of theses could not be secured. Therefore, we do not claim that the corpus of theses examined in this study are representative of the universe of theses across the globe; yet, they are diverse enough to provide us with relevant insights.

For international theses, the most popular database is the ProQuest ( https://pqdtopen.proquest.com ). Yet, its search mechanism does not allow the user to search the theses by country and once the theses are searched using key words, the search results yielded are mostly those written in North American universities, specially the USA. To diversity the corpus and make it more representative of theses done in other universities of the world, we searched the following website: http://www.dart-europe.eu , which gives the user the option of limiting the search to a given country. All the international theses collected were then saved as PDF files.

Our only inclusion criterion was whether a thesis had made use of quantitative measures such as language tests, surveys, questionnaires, rating scales, and the like. To make inclusion decisions, the abstract and the Methods section of each thesis were carefully examined. In order to determine whether and how reliability was treated in each thesis in the domestic corpus, the abstract, the Methods chapter, and in some cases, the Results and Findings chapter were closely examined. As for the international theses, the entire Methods chapter was checked. In case we could not find information about the reliability in the noted sections, we used the search option in Acrobat Reader using the following search terms: reliability, consistency, agreement, alpha, Cronbach, valid, and KR (i.e., KR-20 and KR-21).

Our unit of analysis was the measuring instrument and not the thesis. Two hundred and eleven MIs including 110 language tests, 82 questionnaires, 9 rating scales, 8 coding schemes, and two tests of content area (e.g. math) had been used in the corpus of theses we examined. The most frequently tested aspects of language were overall language proficiency (22), vocabulary (13), writing (12), and reading comprehension (11). Regarding the questionnaires, the most frequently measured constructs were learning strategies (8), motivation (4), and teacher beliefs (4).

The coding process was mainly informed by the research questions, which were about the origin, reliability type, reliability source, and reliability estimation methods. In addition, coding schemes used in similar studies such as Plonsky and Derrick ( 2016 ) and Derrick ( 2016 ) were reviewed. Thus, coding began with the major categories highlighted in research questions. We coded the MIs used in the first 30 theses and after a thesis was coded, if a new category was found, the coding scheme was further refined to accommodate new categories. Therefore, though we started with a set of categories a priori, the actual coding was rather emergent, cyclic, and iterative. Once we settled on the final coding scheme, the entire corpus was coded once again from scratch. To minimize the subjectivity that inhere in coding, a sample of the theses was coded by the second author. The Kappa agreement rate was 96% and in a few cases of disagreement, the differences were resolved through discussion between the authors. Table  1 shows the final coding system used.

Finally, to analyze data generated using our coding scheme, we mainly used descriptive statistics such as raw frequencies, percentages, and graphic representation of data using bar graphs. In cases where we needed to compare the domestic with international theses, we used Pearson chi-square test of independence, as a non-parametric analytic procedure (see Pallant, 2010 , p. 113). The above-mentioned analytic procedures were deemed appropriate because of the nominal and discrete nature of the data we worked with in this study.

In this section, we first report the findings on the origin of MIs. Next, findings with regard to facets of reliability reported. This is then followed by reporting the results related to reliability estimation procedures used in the corpus. The source of reliability estimate along with reliability reporting across construct types comes next. Finally, findings pertaining to reliability across the domestic and international corpus of theses are reported.

Our first research question was about the origin of measuring instruments used. That is, we looked for information about whether a measurement tool used had been adapted or adopted from a previous work, designed by the researcher, adapted and translated, compiled from various measures and then adapted to the study context, or if the origin of the MI was not specified in the theses. As Fig.  1 displays, in 12 cases, the authors failed to give information regarding the origin of their MIs. Roughly half of the MIs had been designed by researchers, and a third of them had been adopted from previous studies. In the remaining cases, they had been either adapted ( n  = 15), their origin was not reported ( n  = 12), they were compiled and then adapted (6), or adapted and then translated ( n  = 4).

figure 1

Frequency of reporting instrument origin

The second research question of the study concerned facets of reliability (Grebowsky, 2018) that were addressed and the estimation procedures used for computing reliability . According to Fig.  2 , for 93 MIs, the authors did not provide any information about the reliability of the instruments they used. In cases where reliability was reported, internal consistency was the most commonly used reliability facet ( n  = 75), followed by inter-rater reliability ( n  = 8), inter-rater and internal consistency ( n  = 7), and the test–retest method ( n  = 6). On the other hand, for 18 instruments, no information was provided about the reliability facet that had been reportedly used. That is, the thesis writers did not specify the facet of reliability they had examined.

figure 2

Frequency of reporting each reliability facet

As to reliability estimation procedures used in the corpus, Fig.  3 shows that Cronbach alpha stands out with a frequency of 65, followed by Pearson correlation ( n  = 7). The two Kuder-Richardson formulas with a frequency of six and five, respectively, come next. Other less frequently used reliability estimation procedures are Spearman, Kappa, Pearson chi-square, Cohen K , and paired sample t -test. It bears noting that in 19 cases, the reliability estimation procedure was not specified. In other words, the thesis writers did not specify how they had arrived at the reliability coefficient they reported.

figure 3

Methods of estimating reliability

The third research question was about the source of the reported reliability estimate. We sought to know whether and the extent to which researchers report the reliability of their own data (i.e., primary reliability), report a reliability index from a previous study (i.e., cited reliability), or report both primary and cited reliability. The results showed that in the majority of cases ( n  = 96), primary reliability was reported. In four cases, both primary and cited reliabilities were reported and for 10 MIs, a reliability estimate from another study was reported (i.e., cited reliability).

Our fourth research question concerned whether the type of construct measured by MIs (i.e., language vs. non-language constructs) moderates the frequency with which reliability is reported (see Figs.  3 and 4 ).

figure 4

Frequency of reporting reliability for language and non-language measures

To know if there is any association between construct type measured and the extent to which reliability is reported, Pearson chi-square test of independence (see Pallant, 2010 p. 113) was run. It was found that reliability reporting did not significantly vary across construct types, X 2 (1, N  = 211) = 0.23, p  = 0.62.

Finally, we sought to know whether reliability reporting practices vary across the domestic and international corpus. Table  2 gives the frequency of reporting reliability in the domestic and international theses.

As Table  2 displays, reliability seemed to be more frequently reported in the domestic corpus of theses. To see if the apparent difference in frequency is significant, another Pearson chi-square test for independence was conducted, which showed that the difference is significant, X 2 (1, N  = 211) = 4.59, p  = 0.02.

Conclusions and discussion

The credibility of knowledge and of research findings continues to spark debate, confusion, and controversy. Hence, across research paradigms, the question of whether and how truth is to be established has been addressed differently. In Applied Linguistics, the question of truth and credibility is often addressed using the notion of research validity, which can be threatened or compromised by different sources including inconsistencies in evidence arising from temporal, spatial, social sources. The issue of consistency is treated by examining reliability. It is assumed that when consistency is not established, claims of truth or validity cannot be made (Chapelle, 2020 ). In this study, we examined whether and the extent to which the reliability of measuring instruments used in measuring variables in research is addressed. More specifically, we probed into reliability reporting practices in a corpus of domestic and international theses in Applied Linguistics.

Overall, our findings in this study indicate that in a considerable number of cases, the researchers failed to examine the reliability of their research instruments and this held constant across language and non-language measuring instruments, which echo the findings of similar studies on published papers such as Plonsky and Gass ( 2011 ), Plonsky and Derrick ( 2016 ), and Purpura et al. ( 2015 ). It was also found that reliability was often treated in a ritualistic manner where, by default, researchers opt for examining the internal consistency facet of their instruments without providing a logic to choosing this facet at the elimination of other reliability facets. This finding accords with those of several studies across a number of fields (Douglas, 2001 ; Dunn et al., 2014 ; Hogan et al., 2000 ; Plonsky & Derrick, 2016 ). Finally, it was observed that in domestic corpus of theses, reliability if frequently reported than in the international corpus. In the remainder of this section, we try to explain the observed findings drawing on a socio-material frame of thought (see Canagarajah, 2018 ; and Coole & Frost, 2010 ) and sociology of knowledge (Dant, 2013 ).

More specifically, our finding that compared to the research articles, reliability is more frequently reported in theses might have to do with space issues as a dimension of material considerations or disciplinary conventions (Harding & Kremmel, 2021 ). Likewise, the dominant tendency to choose Cronbach’s alpha as an index of reliability must be due to logistic and practicality concerns, as alpha is the default reliability facet in most statistical packages. Socio-material considerations are also at play when researchers often treat reliability in a post hoc manner after they have already conducted their main study. In such cases, if reliability of the data turns out to be low, researchers would prefer to skip reporting reliability (Grabowski & Oh, 2018 ) rather than starting over, modifying instruments, and collecting new data.

Other aspects of the findings can be accounted for by drawing on sociology of knowledge, particularly by invoking issues of genre and conventions within Applied Linguistics as discourse communities. For instance, contrary to our expectations, we found more frequent reporting of reliability in the domestic corpus. We tend to think that this might have to do with a certain discourse around reliability that is dominant in the Iranian Applied Linguistics community, where common sense meaning of reliability and its psychometric meaning are possibly conflated. As Ennis ( 1999 ) notes, reliable data does not mean good data, nor does it mean data we can rely on. These are common sense meanings of the term reliability. In contrast, in the educational measurement and psychometric discourse community, reliable data only mean data that is consistent across some test method facets. When researchers take reliable data to mean good data, they would give it more value and try to report it more frequently as a perceived index of research rigor.

Another observation that can be made sense of by invoking discursive realities has to do with the origin of MIs, which in many cases were designed by researchers. Measurement in the social sciences continues to be a source of controversy (Lather, 1993 ). There are some who believe that all measurements in psychometrics and education are flawed because they conflate statistical analysis with measurement, for the very objects of measurement fail to satisfy the ontological conditions of quantification (see Michell, 1999 , 2008 ). Lather even go so far as to say that validity as a mechanism “to discipline the disciplines” is in fact the problem not the solution. Yet, despite all the complexities around measurement, it is not uncommon in Applied Linguistics to observe simplistic approaches to measuring instruments where any set of assembled items is taken to serve as a measuring instrument. It is for this reason that language testing scholars believe that designing a measuring instrument demands expertise and assessment literacy (Harding & Kremmel, 2021 ; Phakiti, 2021 ; Purpura et al, 2015 ), which is often in short supply in the academic south of the world (Oakland, 2009 ).

A further discursive myth regarding reliability that is somewhat common in Applied Linguistics community is that reliability is a characteristic of the measuring instrument (Grabowski & Oh, 2018 ; Larson-Hall & Plonsky, 2015 ; Vacha-Haase, 1998 ). This myth explains our finding that in many cases, some thesis writers rely on a reported reliability in the literature rather than examining the reliability of their own data. As Rowley ( 1976 ) states “It needs to be established that an instrument itself is neither reliable nor unreliable…A single instrument can produce scores which are reliable, and other scores which are unreliable” (p. 53).

Relatedly, some measuring conventions and reliability practices seem to have become dogmatized, at least in some communities of social science and Applied Linguistics. One such dogma is the status that Cronbach alpha has come to enjoy. Some methodologists maintain that repeated use of alpha has become dogmatized, routinized, and ingrained in the culture of research in social sciences and humanities (Dunn et al., 2014 ), and despite the heavy scrutiny that alpha has recently come under, recommendations from statistics experts have yet to penetrate research in social science, psychology, and Applied Linguistics research (McNeish, 2018 ). Alpha, like many other statistics, makes certain assumptions about the data, which are often ignored by researchers (Dunn et al., 2014 ; McNeish, 2018 ). In addition, these assumptions have been demonstrated to be unrealistic and difficult to meet (Dunn et al., 2014 ). For the noted flaws in alpha, scholars have called for more robust ways of assessing reliability such as exploratory and confirmatory factor analysis. Yet, there seems to be a prevailing reluctance on the part of most researchers to go beyond Cronbach alpha perhaps because of the technical knowledge that is necessary for proper use, implementation, and interpretation of exploratory and confirmatory factor analysis. A further limitation that should be taken into consideration with regard to alpha is that alpha is essentially a parametric statistic assuming continuous data and non-skewed distributions (Grabowski & Oh, 2018 ). However, in much Applied Linguistics research, the kind of score interpretations made of quantitative data are of criterion-referenced nature with positively or negatively skewed distributions, which would require specific reliability estimation that are different from those commonly used for norm-referenced interpretations (Bachman, 2004 ; Brown, 2005 ; Brown & Hudson, 2002 ).

Implications

In this study, we claimed that sociomaterial and discursive considerations account for current practices and approaches to measuring instruments and their reliability in theses written in Applied Linguistics. As noted above, some of the pitfalls in measuring language and non-language constructs stem from rigid disciplinarity that characterizes current higher education structure. This insulation of disciplines results in our becoming unaware of insights and progress that is made in neighboring disciplines. As Long and Richards ( 1998 , p. 27) maintain, “advances in language testing” remain “a closed book” for some, if not many, Applied Linguistics researchers (Chapelle, 2021 ). Perhaps, this is partly due to further compartmentalization that has transpired in Applied Linguistics as a result of which the sub-disciplines of the field are hardly aware of each other’s advances (Cook, 2015 ).

Therefore, more inter and cross-discipline dialogue and research holds the potential to deepen our understanding of sound measurement of constructs in Applied Linguistics. Some scholars go even further to suggest that Applied Linguistics must be seen as epistemic assemblage, which would strip the established sub-disciplines of Applied Linguistics of their ontological status as disciplines (Pennycook, 2018 ). Accordingly, to increase research rigor, we would like to call further cross-fertilization among SLA, language teaching, language testing, and even the broader field of measurement in social and physical sciences.

One curious observation we made in this study was that, in some cases, high alpha indexes were reported for proficiency tests that had been used to ensure the homogeneity of a sample of participants, often with the conclusion that the sample turned out to be homogenous. Given that parametric assumptions of alpha are violated with a homogenous sample of participants, high alpha values are almost impossible to obtain. How such high alpha coefficients have been produced remains an open question. The implication that awareness of such malpractices carries is that Cronbach’s alpha and other reliability estimation procedures make assumptions about the data. Unless there is evidence that such assumptions have been met, one is not justified in using the chosen reliability estimation methods (Grabowski & Oh, 2018 ). Therefore, to foster research rigor, a ritualistic reporting of a high alpha coefficient is not adequate. Rather, both common sense and expertise in language assessment must be drawn upon to judge MI quality.

The other implication is that investigating and maximizing reliability must not be guided solely by practical considerations and statistical analysis. Instead, theoretical and substantive considerations should inform the process. As every research context is likely to be different, it falls on the researcher to predict and explain all the possible internal and external factors bearing on the consistency of the data collected via quantitative instruments (Grabowski & Oh, 2018 ). It is this context-bound nature of reliability that makes it difficult to prescribe any rule that would work across contexts for all instruments.

We would like to support the call for more rigor and conservatism in designing, adopting, and adapting measurement instruments in Applied Linguistics research. Graduate students and early career professors should not shy away from deep reflections on and involvement in the foundations of research design and data collection methods. The critique made of research in education four decades ago Pedhazur ( 1992 p. 368) still holds true.

There is a curious mythology about understanding and mastery of the technical aspects of research. Statistics is often called “mere statistics,” and many behavioral researchers say they will use a statistician and a computer expert to analyze their data.

An artificial dichotomy between problem conception and data analysis is set up.

To think that a separate group of experts are responsible for the design and development of proper measurements and to think that the job of the research practitioner is to merely use those instruments is to perpetuate the noted artificial dichotomy between research practice and theoretical conceptions.

In sum, measurement is a tricky business even physics. In the social sciences where we work with humans, language, and discourse within complex socio-political structures, isolating, defining, and measuring constructs is very complicated. If this statement sounds radical, it is only because we in Applied Linguistics are insulated from serious debates about the ontology and epistemology of measurement (see Michell, 1999 ; Markus & Borsboom, 2013 ; Chapelle, 2020 ). Furthermore, the massification of higher education and the publish or perish regime in the academia has generated a mindset which takes a superficial and simplistic approach to testing complex social constructs. To improve on this situation, the fast food approach to research production (Pourmozafari, 2020 ) should be discouraged and countered.

Availability of data and materials

Data can be supplied upon request.

Abbreviations

Measurement instrument

English as a Foreign Language

Research article

Kuder-Richardson

Second language acquisition

Al-Hoorie, A. H., & Vitta, J. P. (2019). The seven sins of L2 research: a review of 30 journals’ statistical quality and their CiteScore, SJR, SNIP JCR impact factors. Language Teaching Research, 23 (6), 727–744.

Article   Google Scholar  

Bachman, L. F. (2004). Statistical analyses for language assessment book . Cambridge: Cambridge University Press.

Book   Google Scholar  

Bachman, L. F., & Palmer, A. (2010). Language assessment in practice: developing language assessments and justifying their use in the real world . Oxford: Oxford University Press.

Google Scholar  

Bachman, L. F., & Cohen, A. D. (Eds.). (1998). Interfaces between second language acquisition and language testing research . Cambridge: Cambridge University Press.

Brown, J. D. (2005). Testing in language programs: a comprehensive guide to English language assessement . New York: McGraw-Hill.

Brown, J. D., & Hudson, T. (2002). Criterion-referenced language testing . Cambridge: Cambridge University Press.

Canagarajah, S. (2018). Materializing ‘competence’: Perspectives from international STEM scholars. The Modern Language Journal , 102 (2), 268–291.

Chapelle, C. A. (2020). Argument-based validation in testing and assessment . Los Angeles: Sage.

Chapelle, C. A. (2021). Validity in language assessment. In P. Winke & T. Brunfaut (Eds.), The Routledge handbook of second language acquisition and language testing (pp. 11–20). New York: Routledge.

Chen, X. (2016). Challenges and strategies of teaching qualitative research in China. Qualitative Inquiry, 22 (2), 72–86.

Cook, G. (2015). Birds out of dinosaurs: the death and life of applied linguistics. Applied linguistics, 36 (4), 425–433.

Coole, D., & Frost, S. (2010). Introducing the new materialisms. New materialisms: ontology, agency, and politics. In D. Coole & S. Frost (Eds.), New materialisms: Ontology, agency, and politics (pp. 1–43).

Dant, T. (2013). Knowledge, ideology & discourse: a sociological perspective . London: Routledge.

Derrick, D. J. (2016). Instrument reporting practices in second language research. TESOL Quarterly, 50 (1), 132–153.

Douglas, D. (2001). Performance consistency in second language acquisition and language testing research: a conceptual gap. Second Language Research, 17 (4), 442–456.

Douglas, D. (2014). Understanding language testing . London: Routledge.

Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: a practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105 (3), 399–412.

Ennis, R. H. (1999). Test reliability: a practical exemplification of ordinary language philosophy. Philosophy of Education Yearbook

Gonulal, T. (2019). Statistical knowledge and training in second language acquisition: the case of doctoral students. ITL-International Journal of Applied Linguistics, 17 (1), 62–89.

Gonulal, T., Loewen, S., & Plonsky, L. (2017). The development of statistical literacy in applied linguistics graduate students. ITL-International Journal of Applied Linguistics, 168 (1), 4–32.

Grabowski, K. C., & Oh, S. (2018). Reliability analysis of instruments and data coding. In A. Phakit, P. De Costa, L. Plonsky, & S. Starfield (Eds.), The Palgrave handbook of applied linguistics research methodology (pp. 541–565). London: Springer.

Chapter   Google Scholar  

Harding, L., & Kremmel, B. (2021). SLA researcher assessment literacy. In P. Winke & T. Brunfaut (Eds.), The Routledge handbook of second language acquisition and language testing . New York: Routledge.

Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: a note on the frequency of use of various types. Educational and Psychological Measurement, 60 (4), 523–531.

Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement. Westport, Conn: Praeger.

Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50 (1), 1–73.

Khany, R., & Tazik, K. (2019). Levels of statistical use in applied linguistics research articles: from 1986 to 2015. Journal of Quantitative Linguistics, 26 (1), 48–65. https://doi.org/10.1080/09296174.2017.1421498 .

Kruglanski, A. W. (2013). Lay epistemics and human knowledge: cognitive and motivational bases . New York: Plenum Press.

Larson-Hall, J., & Plonsky, L. (2015). Reporting and interpreting quantitative research findings: what gets reported and recommendations for the field. Language Learning, 65 (S1), 127–159.

Lather, P. (1993). Fertile obsession: validity after poststructuralism. The Sociological Quarterly, 34 (4), 673–693.

Lindstromberg, S. (2016). Inferential statistics in language teaching research: a review and ways forward. Language Teaching Research, 20 (6), 741–768.

Loewen, S., & Gass, S. (2009). The use of statistics in L2 acquisition research. Language Teaching, 42 (2), 181–196.

Long, M. H. & Richards, J. C. (1998). Series editors' preface. In Bachman, L. F., & Cohen, A. D. (Eds.). (1998). Interfaces between second language acquisition and language testing research (p. 27–28). Cambridge: Cambridge University Press.

Markus, K. A., & Borsboom, D. (2013). Frontiers of test validity theory: measurement, causation, and meaning . New York: Routledge.

McNeish, D. (2018). Thanks coefficient alpha, we’ll take it from here. Psychological Methods, 23 (3), 412.

Medland, E. (2019). ‘I’m an assessment illiterate’: towards a shared discourse of assessment literacy for external examiners. Assessment and Evaluation in Higher Education, 44 (4), 565–580.

Meier, S. T., & Davis, S. R. (1990). Trends in reporting psychometric properties of scales used in counseling psychology research. Journal of Counseling Psychology, 37 (1), 113.

Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher , 18 (2), 5–11.

Michell, J. (1999). Measurement in psychology: a critical history of a methodological concept . Cambridge: Cambridge University Press.

Michell, J. (2008). Is psychometrics pathological science? Measurement, 6, 7–24.

Newton, P., & Shaw, S. (2014). Validity in educational and psychological assessment . California: Sage.

Oakland, T. (2009). How universal are test development and use. In E. Grigorenko (Ed.), Multicultural psychoeducational assessment (pp. 1–40). New York: Springer.

Pallant, J. (2010). SPSS Survival Manual ( 4th ed). Open University Press: Maidenhead.

Pedhazur, E. J. (1992). In Memoriam—Fred N. Kerlinger (1910–1991). Educational Researcher , 21 (4), 45–45.

Pennycook, A. (2018). Applied linguistics as epistemic assemblage. AILA Review, 31 (1), 113–134.

Phakiti, A. (2021). Likert-type Scale Construction. In P. Winke, & T. Brunfaut (eds). The Routledge handbook of second language acquisition and language testing

Plonsky, L. (2013). Study quality in SLA: an assessment of designs, analyses, and reporting practices in quantitative L2 research. Studies in Second Language Acquisition, 35 (4), 655–687.

Plonsky, L., & Derrick, D. J. (2016). A meta-analysis of reliability coefficients in second language research. The Modern Language Journal, 100 (2), 538–553.

Plonsky, L., & Gass, S. (2011). Quantitative research methods, study quality, and outcomes: the case of interaction research. Language Learning, 61 (2), 325–366. https://doi.org/10.1111/j.1467-9922.2011.00640.x .

Plonsky, L., Egbert, J., & Laflair, G. T. (2015). Bootstrapping in applied linguistics: assessing its potential using shared data. Applied Linguistics, 36 (5), 591–610.

Pourmozafari, D. (2020). Personal communication .

Purpura, J. E., Brown, J. D., & Schoonen, R. (2015). Improving the validity of quantitative measures in applied linguistics research 1. Language Learning, 65 (S1), 37–75.

Rowley, G. L. (1976). Notes and comments: the reliability of observational measures. American Educational Research Journal, 13 (1), 51–59.

Ryen, A., & Gobo, G. (2011). Editorial: managing the decline of globalized methodology. International journal of Social Research Methodology, 14, 411–415.

Stone, J., & Zumbo, B. D. (2016). Validity as a pragmatist project: A global concern with local application. In V. Aryadoust & J. Fox (Eds.), Trends in language assessment research and practice (pp. 555–573). Newcastle: Cambridge Scholars Publishing.

Thompson, B. (1988). Common methodology mistakes in dissertations: improving dissertation quality . Louisville, KY: Paper presented at the annual meeting of the Mid-South Educational Research Association.

Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58 (1), 6–20.

Vacha-Haase, T., Ness, C., Nilsson, J., & Reetz, D. (1999). Practices regarding reporting of reliability coefficients: a review of three journals. The Journal of Experimental Education, 67 (4), 335–341.

Weir, C. J. (2005). Language testing and validation . Hampshire: Palgrave McMillan.

Wilder, L. K., & Sudweeks, R. R. (2003). Reliability of ratings across studies of the BASC. Education and Treatment of Children, 26 (4), 382–399.

Winke, P., & Brunfaut, T. (Eds.). (2021). The Routledge handbook of second language acquisition and language testing . New York: Routledge.

Download references

Acknowledgements

We appreciate the Editors and Reviewers of Language Testing in Asia for their timely review and feedback.

We received no funding for conducting this study.

Author information

Authors and affiliations.

Department of English Language and Literature, College of Letters and Humanities, Shahid Chamran University of Ahvaz, Ahvaz, Iran

Kioumars Razavipour & Behnaz Raji

You can also search for this author in PubMed   Google Scholar

Contributions

This paper is partly based on Ms. Raji's M.A thesis, which was supervised by Kioumars Razavipour. The paper however was solely written by Kioumars Razavipour. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Kioumars Razavipour .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Razavipour, K., Raji, B. Reliability of measuring constructs in applied linguistics research: a comparative study of domestic and international graduate theses. Lang Test Asia 12 , 16 (2022). https://doi.org/10.1186/s40468-022-00166-5

Download citation

Received : 06 October 2021

Accepted : 26 April 2022

Published : 18 May 2022

DOI : https://doi.org/10.1186/s40468-022-00166-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Reliability
  • Consistency
  • Data elicitation instrument

thesis about language testing

  • Bibliography
  • More Referencing guides Blog Automated transliteration Relevant bibliographies by topics
  • Automated transliteration
  • Relevant bibliographies by topics
  • Referencing guides

Dissertations / Theses on the topic 'Language testing and assessment'

Create a spot-on reference in apa, mla, chicago, harvard, and other styles.

Consult the top 50 dissertations / theses for your research on the topic 'Language testing and assessment.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

Alobaid, Adnan Othman. "Testing, Assessment, and Evaluation in Language Programs." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/613422.

Kuhn, Amanda J. "A Study in Computerized Translation Testing (CTT) for the Arabic Language." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3108.

Perea-Hernandez, Jose Luis. "Teacher Evaluation of Item Formats for an English Language Proficiency Assessment." PDXScholar, 2010. https://pdxscholar.library.pdx.edu/open_access_etds/436.

鄭敏芝 and Man-chi Sammi Cheng. "Self assessment in the school-based assessment speaking component in aHong Kong secondary four classroom: a casestudy." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B4324080X.

Kim, Youn-Hee 1979. "An investigation into variability of tasks and teacher-judges in second language oral performance assessment /." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=111931.

Potrus, Dani. "Swedish Sign Language Skills Training and Assessment." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209129.

Saville, N. D. "Developing a model for investigating the impact of language assessment within educational contexts by a public examination provider." Thesis, University of Bedfordshire, 2009. http://hdl.handle.net/10547/134953.

Tanner, Lindsay Elizabeth. "Testing the Test: Expanding the Dialogue on Workplace Writing Assessment." BYU ScholarsArchive, 2017. https://scholarsarchive.byu.edu/etd/6616.

Yanagawa, Kozo. "A partial validation of the contextual validity of the Centre Listening Test in Japan." Thesis, University of Bedfordshire, 2012. http://hdl.handle.net/10547/267493.

Calder, Maryna. "Self-assessment of lexical knowledge in second language vocabulary acquisition." Thesis, Swansea University, 2013. https://cronfa.swan.ac.uk/Record/cronfa43186.

Lombaard, Malinda. "Task-based assessment for specific purpose Sesotho for personnel in the small business corporation." Thesis, Link to the online version, 2006. http://hdl.handle.net/10019/235.

Pryde, Susanne Mona Graham. "Low frequency vocabulary and ESL writing assessment." Thesis, Hong Kong : University of Hong Kong, 1998. http://sunzi.lib.hku.hk/hkuto/record.jsp?B2012496X.

Evans, Jeremy S. "Exploring the Language of Assessment on Reading Proficiency Exams of Advanced Learners of Russian." BYU ScholarsArchive, 2015. https://scholarsarchive.byu.edu/etd/5651.

Kung, Wai-yin, and 龔惠妍. "An analysis of the language proficiency assessment for teachers in Hong Kong." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2005. http://hub.hku.hk/bib/B45012453.

Yik, Michelle Siu Mui. "A circumplex model of affect and its relation to personality : a five-language study." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape10/PQDD_0003/NQ39007.pdf.

Yuen, Hon-ming Jacky, and 袁漢明. "Implementing peer assessment and self-assessment in a Hong Kong classroom." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1998. http://hub.hku.hk/bib/B31944966.

Thompson, Meri Dawn. "Authentic reading assessment: The reading portfolio." CSUSB ScholarWorks, 1995. https://scholarworks.lib.csusb.edu/etd-project/1134.

Theron, Janina. "Pragmatic assessment of schizophrenic bilinguals' L1 and L2 use : a comparison of three assessment tools." Thesis, Stellenbosch : University of Stellenbosch, 2009. http://hdl.handle.net/10019.1/1783.

Lee, Siu-fan, and 李少芬. "An investigation of teacher's interpretations of target oriented assessment in English language." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B31945090.

Marais, Fiona C. "An investigation into the significance of listening proficiency in the assessment of academic literacy levels at Stellenbosch University." Thesis, Stellenbosch : University of Stellenbosch, 2009. http://hdl.handle.net/10019.1/1636.

Sokolova, Natalia. "Investigation of final language assessment for pre-service teachers of English in the Russian educational context : a case study." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/25486.

Lam, Ming Kei. "Assessing interactional competence : the case of school-based speaking assessment in Hong Kong." Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/25707.

Balizet, S. "Sha" G. "A dynamic simulation assessment of english as a second language students' academic readiness." Scholar Commons, 2005. http://scholarcommons.usf.edu/etd/2970.

Khabbazbashi, Nahal. "An investigation into the effects of topic and background knowledge of topic on second language speaking performance assessment in language proficiency interviews." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:359c8956-4561-43a8-a7ae-eba1e0dab51c.

Chinda, Bordin. "Professional development in language testing and assessment : a case study of supporting change in assessment practice in in-service EFL teachers in Thailand." Thesis, University of Nottingham, 2009. http://eprints.nottingham.ac.uk/10963/.

Ducher, Jeannie. "Experiences of Foreign Language Teachers and Students Using a Technology-Mediated Oral Assessment." Scholar Commons, 2010. http://scholarcommons.usf.edu/etd/3542.

Adolfsson, Helen. "Teaching, Testing and Assessment; an Interrelation of English as a Foreign Language - The Swedish National Test of English." Thesis, Högskolan i Halmstad, Sektionen för humaniora (HUM), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-19353.

Payton, Lisa. "Teaching Practices That May Improve Student Achievement on the High School Assessment Program (HSAP) for English Language Arts." ScholarWorks, 2016. https://scholarworks.waldenu.edu/dissertations/2683.

Alabdelwahab, Sharif Q. "PORTFOLIO ASSESSMENT: A QUALITATIVE INVESTIGATION OF PORTFOLIO SELF-ASSESSMENT PRACTICES IN AN INTERMEDIATE EFL CLASSROOM, SAUDI ARABIA." Columbus, Ohio : Ohio State University, 2002. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1037375641.

Cox, Troy L. "Investigating Prompt Difficulty in an Automatically Scored Speaking Performance Assessment." BYU ScholarsArchive, 2013. https://scholarsarchive.byu.edu/etd/3929.

Riska, Kristal M., Owen Murnane, Faith W. Akin, and Courtney D. Hall. "Video Head Impulse Testing (vHIT) and the Assessment of Horizontal Semicircular Canal Function." Digital Commons @ East Tennessee State University, 2015. https://dc.etsu.edu/etsu-works/547.

Waller, Daniel. "Investigation into the features of written discourse at levels B2 and C1 of the CEFR." Thesis, University of Bedfordshire, 2015. http://hdl.handle.net/10547/606056.

Fisher, Janis Linch Banks. "English writing placement assessment: Implications for at-risk learners." CSUSB ScholarWorks, 2001. https://scholarworks.lib.csusb.edu/etd-project/3022.

Thompson, Carrie A. "The Development and Validation of a Spanish Elicited imitation Test of Oral Language Proficiency for the Missionary Training Center." BYU ScholarsArchive, 2013. https://scholarsarchive.byu.edu/etd/3602.

Yu, Eunjyu. "A comparative study of the effects of a computerized English oral proficiency test format and a conventional SPEAK test format." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1164601340.

Stevenson, Lisa M. "A comparison of English and Spanish assessment measures of reading and math development for Hispanic dual language students." Diss., University of Iowa, 2014. https://ir.uiowa.edu/etd/4764.

Tucci, Alexander, and Alexander Tucci. "Item Analysis for the Development of the Shirts and Shoes Test for 6-Year-Olds." Thesis, The University of Arizona, 2017. http://hdl.handle.net/10150/625273.

Chapman, Mark Derek. "The effect of the prompt on writing product and process : a mixed methods approach." Thesis, University of Bedfordshire, 2016. http://hdl.handle.net/10547/621846.

Dunlea, Jamie. "Validating a set of Japanese EFL proficiency tests : demonstrating locally designed tests meet international standards." Thesis, University of Bedfordshire, 2015. http://hdl.handle.net/10547/618581.

DeArmond, Kathryn. "The use of phonological process assessment for differentiating developmental apraxia of speech from functional articulation disorders." PDXScholar, 1990. https://pdxscholar.library.pdx.edu/open_access_etds/3980.

Guichard, Jonathan. "Quality Assessment of Conversational Agents : Assessing the Robustness of Conversational Agents to Errors and Lexical Variability." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-226552.

Marks, Erelyne Lewis, Barbara Mabey Oliver, and Maureen Sugar Wolter. "A writing improvement and authentic assessment plan." CSUSB ScholarWorks, 1998. https://scholarworks.lib.csusb.edu/etd-project/1442.

Barrows, Jacob Garlin. "The Effect of Prompt Accent on Elicited Imitation Assessments in English as a Second Language." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/5654.

Alsagoafi, Ahmad Abdulrahman. "An investigation into the construct validity of an academic writing test in English with special reference to the Academic Writing Module of the IELTS Test." Thesis, University of Exeter, 2013. http://hdl.handle.net/10871/10121.

Griffith, Lori Jean. "Normative study of phonological process patterns of preschool children as measured by the Assessment of phonological processes, revised." PDXScholar, 1987. https://pdxscholar.library.pdx.edu/open_access_etds/3740.

Pendergrass, Carmen Cristy. "The Effects of Grade Configuration on Sixth, Seventh, and Eighth Grade Students’ TNReady English Language Arts and Math Achievement." Digital Commons @ East Tennessee State University, 2019. https://dc.etsu.edu/etd/3675.

Bartlett, Brian Michael. "Computerized reading assessment using the star reading software." CSUSB ScholarWorks, 2004. https://scholarworks.lib.csusb.edu/etd-project/2527.

Beckham, Semra. "Effects of Linguistic Modification Accommodation on High School English Language Learners’ Academic Performance." Thesis, NSUWorks, 2015. https://nsuworks.nova.edu/fse_etd/3.

Kendall, Constance Lynn. "The Worlds We Deliver: Confronting the Consequences of Believing in Literacy." Oxford, Ohio : Miami University, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=miami1117215518.

Torrie, Heather Colleen. "A Web-based Tool for Oral Practice and Assessment of Grammatical Structures." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd1972.pdf.

Help | Advanced Search

Computer Science > Computation and Language

Title: quiet-star: language models can teach themselves to think before speaking.

Abstract: When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%$\rightarrow$10.9%) and CommonsenseQA (36.3%$\rightarrow$47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way.

Submission history

Access paper:.

  • Download PDF
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

thesis about language testing

1,200 Indiana schools prep for ILEARN pilot program launch

I NDIANAPOLIS — The way in which thousands of Hoosier kids are tested in math and language arts will get a major overhaul starting this fall.

Historically, schools administer the ILEARN assessment at the end of each school year. However, thanks to an upcoming pilot program, participating schools will test a new strategy: helping students familiarize themselves with the test well ahead of time.

”We have chosen three of our elementary schools and two of our, our two junior highs, actually, to participate,” Dr. Jeff Butts, Superintendent of MSD Wayne Township, said.

Throughout the 2024-2025 school year, school corporations like MSD Wayne Township will administer parts of ILEARN to 3rd-8th graders periodically.

”We’re going to have the opportunity to use three testing cycles as qualitative cycles, before the final test, the final ILEARN test,” Dr. Butts said.

According to the superintendent, the pilot will help teachers personalize remediation and tutoring efforts for students who are falling behind.

”It does give us checkpoints and benchmarks which we can utilize that data then to make adjustments in curriculum and adjustments in interventions that we’re providing to students,” Dr. Butts said.

Dr. Harold Olin, Superintendent of the Greenfield-Central Community School Corporation, also expressed support for such efforts. A statement he released Thursday reads in part: “…the practice of measuring student progress at checkpoints during the school year is a more appropriate way to gauge student learning…”

Overall, the pilot will affect students in 1,200 schools.

”They’re interim exams, short tests,” State Sen. Jeff Raatz (R-Richmond), said. “They’re not high stakes by any means.”

State Sen. Raatz said the program is expected to last for one year before launching statewide in 2025.

”The concept that 1,200 schools opted into this thing before the trigger was even pulled on it tells a story,” State Sen. Raatz said.

”I do believe that a year’s preparation certainly is enough time to shift our mind and shift our thinking,” Dr. Butts said.

For the latest news, weather, sports, and streaming video, head to Fox 59.

1,200 Indiana schools prep for ILEARN pilot program launch

There's a reason Tennessee offers driver's test in more than English. Don't change that.

It’s time for tennessee to catch up with other states and drive us forward by expanding language access for driver’s tests, and not limiting it..

  • Sabina Mohyuddin is 'Our State, Our Languages' Coalition founding member. Diana Sanchez-Vega is a coalition supporter.

Imagine having a sick child with no way to get to the doctor’s office or having to pass on a job offer because you lack the transportation needed to get to work.

Imagine always relying on friends to drive you to the grocery store or on public transportation, which is desperately lacking across Tennessee just because you couldn’t pass the driver’s test.

Refugees and newly arrived immigrants who eagerly work to establish their new life in America face many barriers they must overcome. But the inability to pass the driver’s test simply because translation or interpretation services are not in their native language is an unnecessary burden and cruel obstacle they face in their journey to economic self-sufficiency.

For over 10 years, community groups have pressed for language access in driver’s license tests. In 2022, a group of grassroots organizations came together to form the "Our State, Our Languages" Coalition .

Column: Nashville’s population is exploding. A three-hour commute proves it

Language access is a civil right

Language access, as recognized by the federal government through the Title VI program, entails providing individuals with limited English proficiency (LEP) meaningful access to the same services as English-speaking individuals, in their own languages. These guidelines help agencies receiving federal funding meet the needs of the communities they serve. A non-compliant agency risks losing federal funding.

Tennessee currently has the driver’s test in only five languages:

The last three of those languages were added as car companies opened manufacturing sites in Tennessee.

According to the U.S. Census Bureau’s 2022 American Community Survey data, 8.1% of working age Tennesseans speak a language other than English. Of those: half speak Spanish. But language diversity in our state does not stop at Spanish with over 30,000 Tennesseans speaking Arabic. This represents 10 times more LEP speakers than Japanese, and between two and three times more LEP speakers than German and Korean, respectively.

Other common languages include Chinese, Vietnamese, Kurdish, Thai, Somali, and Amharic.

More: She survived explosions and fear. Now, this Iraqi refugee is finding purpose in Nashville

Better language access is good for the economy

The hospitality and manufacturing industries inject tens of billions of dollars annually into our state's economy.

Immigrants who are legally able to work make up an integral part of both workforces, but many have difficulty getting to work.

As housing costs escalate, pushing people farther away from their workplaces, the strain on businesses intensifies. Better access to drivers’ licenses will provide much needed relief to those industries in dire need to fill staffing shortages.

Some legislators want English-only driver’s license bill

At a time when Tennessee needs to expand language access to driver’s licenses, our state legislators want to take us backwards and only offer the test in English with Senate Bill 1717 by Sen. Joey Hensley, R-Hohenwald. Critics often push back against language access in driver’s testing saying that people should just learn English. However, mastering a language is a time-intensive process, and families often need to achieve self-sufficiency within a mere three months of arriving in our state.

Critics also say that our roads are safer when drivers know English. Yet, there’s a stark contrast between the language proficiency necessary for passing a written test and comprehending road signs.

Fifteen years ago, Nashvillians decisively rejected an English-only referendum because they understood that the bill would not only make our city unwelcoming but have a detrimental economic impact. Regrettably, the lessons learned 15 years ago have not been considered by the sponsors of the current Senate bill.

Kentucky, Alabama, Texas, and Florida, recognize the value of making the driver’s license test accessible in the languages truly reflective of the communities in their state. It’s time for Tennessee to catch up and drive us forward by expanding language access for driver’s tests, and not limiting it.

Sabina Mohyuddin is 'Our State, Our Languages' Coalition founding member. Diana Sanchez-Vega is 'Our State, Our Languages' Coalition supporter.

‘Our State, Our Languages’ Coalition include: American Muslim Advisory Council, Asian & Pacific Islander of Middle Tennessee, Ethiopian Community Association in Nashville, Never Again Action Tennessee and Somali Community of Middle Tennessee.

House Committee on Appropriations - Republicans

House Committee on Appropriations - Republicans

  • About the Committee
  • Committee Rules
  • Subcommittee Jurisdictions
  • Biography of Chairwoman Kay Granger
  • Past Chairmen
  • Press Releases
  • Business Meetings
  • Enacted Legislation
  • House Legislation by Fiscal Year
  • Legislative Activity by Subcommittee
  • Amendment Tracker
  • Agriculture, Rural Development, Food and Drug Administration
  • Commerce, Justice, Science, and Related Agencies
  • Energy and Water Development and Related Agencies
  • Financial Services and General Government
  • Homeland Security
  • Interior, Environment, and Related Agencies
  • Labor, Health and Human Services, Education
  • Legislative Branch
  • Military Construction, Veterans Affairs, and Related
  • State, Foreign Operations, and Related Programs
  • Transportation, Housing and Urban Development, and Related
  • Fiscal Year 2024 Member Request Guidance
  • Fiscal Year 2024 Member Request Deadlines
  • Fiscal Year 2024 Community Project Funding-Eligible Accounts
  • Fiscal Year 2024 Submitted Community Project Funding Requests
  • Fiscal Year 2024 Community Project Funding
  • Fiscal Year 2023 Community Project Funding
  • Fiscal Year 2022 Community Project Funding

Search form

Appropriations committees release second fy24 package.

WASHINGTON - Today, the House and Senate Appropriations Committees released the second package of final Fiscal Year 2024 appropriations bills. The Further Consolidated Appropriations Act, 2024 will be considered in the House in the coming days. House Appropriations Chairwoman Kay Granger released the following statement on the package:    “House Republicans made a commitment to strategically increase defense spending, make targeted cuts to overfunded non-defense programs, and pull back wasteful spending from previous years. I am proud to say that we have delivered on that promise, and this bill is proof.   “Given that the world is becoming more dangerous, we wanted to send a strong message that we will do everything in our power to protect the American people and defend our interests. This bill funds our highest national security priorities - it invests in a more modern, innovative, and ready fighting force, continues our strong support for our great ally Israel, and provides key border enforcement resources. At the same time, we made cuts to programs that have nothing to do with our national security and pulled back billions from the Administration.   “With the odds stacked against us, House Republicans have refocused spending on America's interests, at home and abroad, and I urge support of this bill.” A summary of the package is available  here . Bill text is available  here . Joint explanatory statements for each division of the package are available below:

  • Front Matter
  • Division A - Defense Appropriations Act, 2024
  • Division B - Financial Services and General Government Appropriations Act, 2024
  • Division C - Homeland Security Appropriations Act, 2024
  • Division D - Labor, Health and Human Services, Education, and Related Agencies Appropriations Act, 2024
  • Division E - Legislative Branch Appropriations Act, 2024
  • Division F - State, Foreign Operations, and Related Programs Appropriations Act, 2024

U.S. flag

An official website of the United States government

Here’s how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

TSA testing language translation-interpretation devices at Philadelphia International Airport checkpoints

thesis about language testing

PHILADELPHIA—Transportation Security Administration (TSA) officers are testing the use of new hand-held language translation-interpretation devices in an effort to support a more positive security checkpoint experience for individuals who are limited English proficient, international travelers and individuals who are deaf or hard of hearing or who are blind or have low vision.

The goal of this pilot is to allow TSA to evaluate the viability of these devices by assessing the ease and effectiveness of use and its impact on checkpoint operations.

“We hope that this will turn out to be a valuable tool for our officers to provide guidance to passengers who might not speak English,” explained Gerardo Spero, TSA’s Federal Security Director for Philadelphia International Airport. “For example, it will help us to explain in the language that the traveler understands, that we may need to open a carry-on bag for a search.”

The device is smaller than a mobile phone and contains a library of 83 languages. A TSA officer or a traveler can speak into the device and it will translate the message into the language that is selected. The device will audibly repeat the message into the chosen language for the traveler and it will appear on the screen for travelers to read if they are deaf or hard of hearing.

TSA has deployed five units to be used at checkpoints in Philadelphia’s international terminals A-East and A-West, and also at its busiest checkpoints in terminals B and DE. Since the rechargeable units work via Wi-Fi or data connection, they can easily be moved to any checkpoint lane where they are needed because they do not need to be tethered to an electrical outlet. “That agility is extremely valuable to us,” Spero explained.

In the short time that the units have been in use, TSA has seen the benefits of the units as well as a few challenges, such as the use of colloquialisms. For example, the term “pat-down” does not translate accurately into all languages and instead TSA officers may need to use different words to explain that a pat-down needs to take place.

TSA can pre-program common advisements that TSA officers use in typical checkpoint conversations. The device can keep up to 10,000 translations that are “favorites” and commonly used. Software updates can be downloaded to add languages and words to the vocabulary library. Some foreign languages have specific dialects and other nuances. For example, the units distinguish four types of Spanish--that spoken in Spain, Argentina, Columbia or the United States.

Checkpoints can be noisy places and as a result, TSA officials have learned that enunciating words into the device is important. For example, the unit may mistakenly translate the words “your coat” into “you’re a coat.” It is this type of information that TSA is gathering to help determine how to work around the ambient noise issues of a checkpoint.

“The field testing of these units is one step that we are taking to improve our communication with a broader traveling population and further enhance the customer experience,” explained Jose Bonilla, TSA’s Executive Director of the agency’s Traveler Engagement Division. “The results of this field test will allow us to evaluate the viability of a small, stand-alone communication device at our checkpoints by assessing the ease and effectiveness of use and its impact on checkpoint operations.”

“This has potential to be a game-changer for travelers who are not fluent in English who come to our checkpoints,” Spero said. “It will ease their travel experience. Already we are seeing a positive impact.”

IMAGES

  1. (PDF) Language Testing and Assessment: A Comprehensive Guide

    thesis about language testing

  2. (PDF) Pragmatics and language testing. In B. Spolsky (Ed.), Advances in

    thesis about language testing

  3. (PDF) The Critical Perspective of English Language Testing and

    thesis about language testing

  4. (PDF) Research in Language Testing

    thesis about language testing

  5. (PDF) Communicative Language Testing

    thesis about language testing

  6. Studies in Language Testing

    thesis about language testing

VIDEO

  1. Academic reading and writing in English Part 1: Introduction

  2. How to Defend Your MS/MPhil/PhD Research Thesis

  3. How to Write an Abstract for Your Thesis Explained in Somali Language

  4. DETERMINING THE FEATURES OF ACADEMIC WRITING

  5. Academic reading and writing in English Part 14: Tentative and objective language

  6. Academic reading and writing in English Part 3: The role of sources

COMMENTS

  1. Language testing and assessment (Part I)

    Assessment and reporting in language learning programs: Purposes, problems and pitfalls. Plenary presentation at the International Conference on Testing and Evaluation in Second Language Education, Hong Kong University of Science and Technology, 21 - 24 June 1995. Google Scholar.

  2. (PDF) Emergent Trends and Research Topics in Language Testing and

    Abstract and Figures. This study, which is of descriptive nature, aims to explore the emergent trends and research topics in language testing and assessment that have attracted increasing ...

  3. Language Testing: Sage Journals

    SUBMIT PAPER. Language Testing is a fully peer reviewed international journal that publishes original research and review articles on language testing and assessment. It provides a forum for the exchange of ideas and information between people working in the fields of … | View full journal description. This journal is a member of the ...

  4. Interpreting testing and assessment: A state-of-the-art review

    In the field of language testing, the assessment of oral communication (e.g., L2 speaking) represents a vibrant line of research, spawning much scholarly discussion and debate over the past three decades (for a historical review, see Fulcher, 2015).However, one area of oral communication—spoken-language interpreting—seems to have drawn far less attention from language testers than it ...

  5. (PDF) Investigating Scoring Procedures in Language Testing

    In language testing bibliography, it has been suggested that L2 grammar and vocabulary scores strongly and positively correlate with L2 reading comprehension. Jeon and Yamashita (2014), conducting ...

  6. PDF The Routledge Handbook of Language Testing; Second Edition

    This second edition of The Routledge Handbook of Language Testing provides an updated and comprehensive account of the area of language testing and assessment. ... Subjects: LCSH: Language and languages—Ability testing. | LCGFT: Essays. Classification: LCC P53.4 .R68 2021 | DDC 401/.93—dc23 LC record available at https://lccn.loc.gov ...

  7. PDF Literature Review of Language Testing Theories and Approaches

    For language teachers, tests per-form both pedagogical and research functions. This essay is a brief literature review of the developments of language testing theories and corresponding testing ...

  8. Future challenges and opportunities in language testing and assessment

    A first priority identified by Bachman (2000) was professionalization of the field, which he defined as "(1) the training of language testing professionals; and (2) the development of standards of practice and mechanisms for their implementation and enforcement" (p. 19). It is reassuring that the training offer has indeed exponentially increased since 2000, in terms of number of training ...

  9. Language Testing

    The prelims comprise: Introduction: The Place of Language Testing within Applied Linguistics. Validation Research in Language Testing. Language Testing as Institutional Practice. Language Tests and Identity. Language Testing Research and Language Learning. Current and Future Developments in Language Testing Research.

  10. Studies in Language Testing (SiLT)

    Studies in Language Testing (SiLT) is a series of academic volumes edited by Professor Lynda Taylor and Dr Nick Saville. It is published jointly by Cambridge English and Cambridge University Press (CUP). The series addresses a wide range of important issues and new developments in language testing and assessment, and is an indispensable ...

  11. PDF Current Issues in Language Evaluation, Assessment and Testing

    Current Issues in Language Evaluation, Assessment and Testing: Research and Practice xiii Figure 13-4: Students' perceptions of their language skills before the course (N = 17). Figure 13-5: Students' perceptions of their language skills after the course (N = 17). Figure 13-6: Student improvement in Task 1 and Task 2 (N = 17).

  12. Editorial: Frontiers in Language Assessment and Testing

    Although language assessment and testing can be viewed as having a much longer history (Spolsky, 2017; Farhady, 2018), its genesis as a research field is often attributed to Carroll's and Lado's publications.Over the past decades, the field has gradually grown in scope and sophistication as researchers have adopted various interdisciplinary approaches to problematize and address old and new ...

  13. English Language Learners in K-12 Classrooms: Problems, Recommendations

    Henderson, Trisha, "ENGLISH LANGUAGE LEARNERS IN K-12 CLASSROOMS: PROBLEMS, RECOMMENDATIONS AND POSSIBILITIES" (2019). Electronic Theses, Projects, and Dissertations. 797. https://scholarworks.lib.csusb.edu/etd/797. This Thesis is brought to you for free and open access by the Ofice of Graduate Studies at CSUSB ScholarWorks.

  14. A meta-analysis on educational technology in English language teaching

    Constructing and validating a multimedia techniques (MTS) scale and examining the impact of using technology in teaching English in Iranian high schools on students' attitudes, anxiety, and language proficiency. Thesis (Unpublished). Imam Reza International University. Sadeqi, M. (2015).

  15. Language Testing

    Language Testing is an international peer reviewed journal that publishes original research on foreign, second, additional, and bi-/multi-/trans-lingual (henceforth collectively called L2) language testing, assessment, and evaluation. The journal's scope encompasses the testing of L2s being learned by children and adults, and the use of tests as research and evaluation tools that are used to ...

  16. Recent Developments in Language Testing and Assessment

    Language Testing, 30(3), 309-327. Tsagari, D. (2020). Language Assessment Literacy: Concepts, Challenges and Prospects. In S. Hidri. (Ed.), ... 200 Greek Cypriot EFL learners' essays (pre- and post-tests) were evaluated taking into consideration four aspects of writing quality after using either PA and teacher assessment (TA) ...

  17. Reliability of measuring constructs in applied linguistics research: a

    The credibility of conclusions arrived at in quantitative research depends, to a large extent, on the quality of data collection instruments used to quantify language and non-language constructs. Despite this, research into data collection instruments used in Applied Linguistics and particularly in the thesis genre remains limited. This study examined the reported reliability of 211 ...

  18. Linguistics Theses and Dissertations

    Theses/Dissertations from 2021. PDF. Trademarks and Genericide: A Corpus and Experimental Approach to Understanding the Semantic Status of Trademarks, Richard B. Bevan. PDF. First and Second Language Use of Case, Aspect, and Tense in Finnish and English, Torin Kelley. PDF. Lexical Aspect in-sha Verb Chains in Pastaza Kichwa, Azya Dawn Ladd.

  19. The Effect of Language Learning Experience on Motivation and Anxiety of

    participant has a language requirement as part of their education), language class level, and language learning environment (those with traditional classroom verses significant in-country experience). The current study surveyed and analyzed the responses of 124 students currently enrolled in a language class at Brigham Young University.

  20. Interpreting testing and © The Author(s) 2021

    In the field of language testing, the assessment of oral communication (e.g., L2 speak-ing) represents a vibrant line of research, spawning much scholarly discussion and debate over the past three decades (for a historical review, see Fulcher, 2015). However, one area of oral communication—spoken-language interpreting—seems to have drawn

  21. Dissertations / Theses: 'Language testing and assessment ...

    This thesis addresses the development of such a model from the perspective of Cambridge ESOL, a provider of English language tests and examinations in over 100 countries. The starting point for the thesis is a discussion of examinations within educational processes generally and the role that examinations board, such as Cambridge ESOL play ...

  22. 1 Software Testing with Large Language Models: Survey, Landscape, and

    Index Terms—Pre-trained Large Language Model, Software Testing, LLM, GPT 1 INTRODUCTION Software testing is a crucial undertaking that serves as a cornerstone for ensuring the quality and reliability of software products. Without the rigorous process of software testing, software enterprises would be reluctant to release

  23. [2403.09629] Quiet-STaR: Language Models Can Teach Themselves to Think

    When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner ...

  24. 1,200 Indiana schools prep for ILEARN pilot program launch

    INDIANAPOLIS — The way in which thousands of Hoosier kids are tested in math and language arts will get a major overhaul starting this fall. Historically, schools administer the ILEARN ...

  25. Tennessee driver's test should be offered in more than just English

    For over 10 years, community groups have pressed for language access in driver's license tests. In 2022, a group of grassroots organizations came together to form the "Our State, Our Languages ...

  26. Embattled Eastern Gateway Community College set to fold

    Eastern Gateway Community College, which has struggled financially over the last year, is set to begin dissolving in June unless it receives enough funding. EGCC trustees decided at a Wednesday meeting that unless the community college is able to obtain "sufficient" funding by May 31, it would begin the process of folding on June 30.

  27. Appropriations Committees Release Second FY24 Package

    WASHINGTON - Today, the House and Senate Appropriations Committees released the second package of final Fiscal Year 2024 appropriations bills. The Further Consolidated Appropriations Act, 2024 will be considered in the House in the coming days. House Appropriations Chairwoman Kay Granger released the following statement on the package: "House Republicans made a commitment to strategically ...

  28. TSA testing language translation-interpretation devices at Philadelphia

    PHILADELPHIA—Transportation Security Administration (TSA) officers are testing the use of new hand-held language translation-interpretation devices in an effort to support a more positive security checkpoint experience for individuals who are limited English proficient, international travelers and individuals who are deaf or hard of hearing or who are blind or have low vision.

  29. New Presidents and Provosts: Jackson State, Lower Columbia, Nevada

    Sarah Frey, vice provost and dean of undergraduate education at the University of California, Merced, has been appointed as provost and vice president of academic affairs at Nevada State University.. Matt Seimears, provost and senior vice president for academic affairs at Eastern Oregon University, has been selected as president of Lower Columbia College, in Washington.

  30. Submission Guidelines: Language Testing: Sage Journals

    1.3.1 Submitting a manuscript based on a dissertation or thesis. Language Testing encourages authors to submit papers based on their dissertations or theses. Authors should submit a cover letter stating that their paper is based on a dissertation or thesis and provide the APA citation to the dissertation or thesis, and the paper should cite the ...