Back to Search | Help | Tutorial Search Within Results | New Search | Save This Search | RSS Feed
Sort By: RelevancePublication Date (newest to oldest)Publication Date (oldest to newest)Title (A to Z)Title (Z to A)Author (A to Z)Author (Z to A)Source (A to Z)Source (Z to A)
Use My Clipboard to print, email, export, and save records. More Info: Help 0 items in My Clipboard
Now showing results 1-10 of 139. Next 10 >>
1. How Important Is Content in the Ratings of Essay Assessments? (EJ785796)
Author(s):
Shermis, Mark D.; Shneyderman, Aleksandr; Attali, Yigal
Source:
Assessment in Education: Principles, Policy & Practice, v15 n1 p91-105 Mar 2008
Pub Date:
2008-03-00
Pub Type(s):
Journal Articles; Reports - Evaluative
Peer-Reviewed:
Yes
Descriptors: Predictor Variables; Test Scoring Machines; Essays; Grade 8; Grade 6; Content Analysis; Literary Genres; Prompting; Word Processing; Scoring; Achievement Rating; Value Judgment
Abstract: This study was designed to examine the extent to which "content" accounts for variance in scores assigned in automated essay scoring protocols. Specifically it was hypothesised that certain writing genre would emphasise content more than others. Data were drawn from 1668 essays calibrated at two grade levels (6 and 8) using "e-rater[TM]", an automated essay scoring engine with established validity and reliability. "E-rater" v 2.0's scoring algorithm divides 12 variables into "content" (scores assigned to essays with similar vocabulary; similarity of vocabulary to essays with the highest scores) and "non-content" (grammar, usage, mechanics, style, and discourse structure) related components. The essays were classified by genre: persuasive, expository, and descriptive. The analysis showed that there were significant main effects due to grade, F(1,1653) = 58.71, p less than 0.001, and genre F(2, 1653) = 20.57, p less than 0.001. The interaction of grade and genre was not significant. Eighth grade students had significantly higher mean scores than sixth grade students and descriptive essays were rated significantly higher than those classified as persuasive or expository. Prompts elicited "content" according to expectations with lowest proportion of content variance in persuasive essays, followed by expository and then descriptive. Content accounted for approximately 0-6% of the overall variance when all predictor variables were used. It accounted for approximately 35-58% of the overall variance when "content" variables alone were used in the prediction equation. (Contains 9 tables, 2 figures and 2 notes.) Note:The following two links are not-applicable for text-based browsers or screen-reading software. Show Hide Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info: Help | Tutorial Help Finding Full Text | More Info: Help Find in a Library | Publisher's Web Site
2. From #2 Pencils to the World Wide Web: A History of Test Scoring (EJ811614)
Zytowski, Donald G.
Journal of Career Assessment, v16 n4 p502-511 2008
2008-00-00
Journal Articles; Reports - Descriptive
No
Descriptors: Educational Testing; Achievement Tests; Computers; Scoring; Academic Aptitude; Internet; Computer Assisted Testing; Psychological Evaluation; Evaluation Methods; Standardized Tests; Student Interests; Test Scoring Machines
Abstract: The present highly developed status of psychological and educational testing in the United States is in part the result of many efforts over the past 100 years to develop economical and reliable methods of scoring. The present article traces a number of methods, ranging from hand scoring to present-day computer applications, stimulated by the need to economically score large-scale scholastic aptitude and achievement tests and complex interest assessments. Note:The following two links are not-applicable for text-based browsers or screen-reading software. Show Hide Full Abstract
3. An Evaluation of Computerised Essay Marking for National Curriculum Assessment in the UK for 11-Year-Olds (EJ776010)
Hutchison, Dougal
British Journal of Educational Technology, v38 n6 p977-989 Nov 2007
2007-11-00
Descriptors: Essays; Computer Uses in Education; Scoring; Comparative Analysis; Foreign Countries; Scores; Test Scoring Machines; Writing (Composition); Elementary School Students
Abstract: This paper reports a comparison of human and computer marking of approximately 600 essays produced by 11-year-olds in the UK. Each essay script was scored by three human markers. Scripts were also scored by the "e-rater" program. There was a good agreement between human and machine marking. Scripts with highly discrepant scores were flagged and assessed blind by expert markers for characteristics considered likely to produce human-machine discrepancies. As hypothesised, essays marked higher by humans exhibited more abstract qualities such as interest and relevance, while there was little, if any, difference on more mechanical factors such as paragraph demarcation. Note:The following two links are not-applicable for text-based browsers or screen-reading software. Show Hide Full Abstract
4. Improving Content Validation Studies Using an Asymmetric Confidence Interval for the Mean of Expert Ratings (EJ682735)
Penfield, Randall D.; Miller, Jeffrey M.
Applied Measurement in Education, v17 n4 p359-370 Oct 2004
2004-10-01
Journal Articles; Reports - General
Descriptors: Student Evaluation; Evaluation Methods; Content Validity; Scoring; Scores; Automation; Test Scoring Machines
Abstract: As automated scoring of complex constructed-response examinations reaches operational status, the process of evaluating the quality of resultant scores, particularly in contrast to scores of expert human graders, becomes as complex as the data itself. Using a vignette from the Architectural Registration Examination (ARE), this article explores the potential utility of Classification and Regression Trees (CART) and Kohonen Self-Organizing Maps (SOM) as tools to facilitate subject matter expert (SME) examination of the fine-grained (feature level) quality of automated scores for complex data, with implications for the validity of resultant scores. This article explores both supervised and unsupervised learning techniques, with the former being represented by CART (Breiman, Friedman, Olshen, & Stone, 1984) and the latter by SOM (Kohonen, 1989). Three applications comprise this investigation, the first of which suggests that CART can facilitate efficient and economical identification of specific elements of complex responses that contribute to automated and human score discrepancies. The second application builds on the first by exploring CART for efficiently and accurately automating case selection for human intervention to ensure score validity. The final application explores the potential for SOM to reduce the need for SMEs in evaluating automated scoring. Although both supervised and unsupervised methodologies examined were found to be promising tools for facilitating SME roles in maintaining and improving the quality of automated scoring, such applications remain unproven, and further studies are necessary to establish the reliability of these techniques. Note:The following two links are not-applicable for text-based browsers or screen-reading software. Show Hide Full Abstract
5. Automated Tools for Subject Matter Expert Evaluation of Automated Scoring (EJ682734)
Williamson, David M.; Bejar, Isaac I.; Sax, Anne
Applied Measurement in Education, v17 n4 p323-357 Oct 2004
Reports - Evaluative; Journal Articles
Descriptors: Validity; Scoring; Scores; Evaluation Methods; Quality Control; Test Scoring Machines; Automation
6. Automated Scoring Technologies and the Rising Influence of Error (EJ716796)
Cheville, Julie
English Journal, v93 n4 p47 Mar 2004
2004-03-01
Descriptors: Scoring; Test Scoring Machines; Writing Exercises; Educational Policy; Private Sector; Error Patterns
Abstract: The professional development organizations educate the local decision-makers by reducing the risks of automated scoring technologies to language and writing practices. These automated assessments lead to changes, which benefits private industry and conflicts with research on writing and language.
More Info: Help | Tutorial Help Finding Full Text | More Info: Help Find in a Library
7. Beyond Essay Length: Evaluating e-rater[R]'s Performance on TOEFL[R] Essays. Research Reports. Report 73. RR-04-04 (ED492918)
Chodorow, Martin; Burstein, Jill
Educational Testing Service
2004-02-00
Numerical/Quantitative Data; Reports - Research; Tests/Questionnaires
N/A
Descriptors: Essays; Test Scoring Machines; English (Second Language); Student Evaluation; Scores; Spanish; Semitic Languages; Japanese; Writing Evaluation
Abstract: This study examines the relation between essay length and holistic scores assigned to Test of English as a Foreign Language[TM] (TOEFL[R]) essays by e-rater[R], the automated essay scoring system developed by ETS. Results show that an early version of the system, e-rater99, accounted for little variance in human reader scores beyond that which could be predicted by essay length. A later version of the system, e-rater01, performs significantly better than its predecessor and is less dependent on length due to its greater reliance on measures of topical content and of complexity and diversity of vocabulary. Essay length was also examined as a possible explanation for differences in scores among examinees with native languages of Spanish, Arabic, and Japanese. Human readers and e-rater01 show the same pattern of differences for these groups, even when effects of length are controlled. Appended are: (1) TOEFL Writing Scoring Guide; and (2) Confusion Matrices for Essay Scores Combined across Mixed Cross-validation Sets for Seven Prompts. (Contains 18 tables, 3 figures, and 5 endnotes.) Note:The following two links are not-applicable for text-based browsers or screen-reading software. Show Hide Full Abstract
ERIC Full Text (354K)
8. The Effect of Specific Language Features on the Complexity of Systems for Automated Essay Scoring. (ED482933)
Cohen, Yoav; Ben-Simon, Anat; Hovav, Myra
2003-10-00
Information Analyses; Speeches/Meeting Papers
Descriptors: Essays; Language Patterns; Language Variation; Scoring; Test Scoring Machines
Abstract: This paper focuses on the relationship between different aspects of the linguistic structure of a given language and the complexity of the computer program, whether existing or prospective, that is to be used for the scoring of essays in that language. The first part of the paper discusses common scales used to assess writing products, then briefly describes various methods of Automated Essay Scoring (AES) and reviews several AES programs currently in use. It also presents empirical results attesting to the reliability and validity of these programs, principally with regard to essays written in English. The second part of the paper presents various linguistic features that may vary extensively across languages and examines the ramifications of these features on the complexity of the AES operational system. This analysis is presented chiefly with regard to Hebrew and English, which are used to illustrate the differences that may exist between languages. (Contains 5 tables and 30 references.) (SLD) Note:The following two links are not-applicable for text-based browsers or screen-reading software. Show Hide Full Abstract
ERIC Full Text (420K)
9. Essay Assessment with Latent Semantic Analysis (EJ773582)
Miller, Tristan
Journal of Educational Computing Research, v29 n4 p495-512 2003
2003-00-00
Descriptors: Semantics; Test Scoring Machines; Essays; Semantic Differential; Comparative Analysis; Methods Research; Evaluation Methods; Writing Evaluation; Writing Research; Computer Assisted Testing; Program Descriptions; Program Implementation
Abstract: Latent semantic analysis (LSA) is an automated, statistical technique for comparing the semantic similarity of words or documents. In this article, I examine the application of LSA to automated essay scoring. I compare LSA methods to earlier statistical methods for assessing essay quality, and critically review contemporary essay-scoring systems built on LSA, including the "Intelligent Essay Assessor," "Summary Street," "State the Essence," "Apex," and "Select-a-Kibitzer." Finally, I discuss current avenues of research, including LSA's application to computer-measured readability assessment and to automatic summarization of student essays. (Contains 2 figures and 6 footnotes.) Note:The following two links are not-applicable for text-based browsers or screen-reading software. Show Hide Full Abstract
10. Assessing Writing through the Curriculum with Automated Essay Scoring. (ED477929)
Shermis, Mark D.; Raymat, Marylou Vallina; Barrera, Felicia
2003-04-00
Reports - Descriptive; Speeches/Meeting Papers
Descriptors: College Students; Essays; Higher Education; Portfolio Assessment; Portfolios (Background Materials); Scoring; Test Scoring Machines; Writing Evaluation; Writing Improvement
Abstract: This paper provides an overview of some recent work in automated essay scoring that focuses on writing improvement at the postsecondary level. The paper illustrates the Vantage Intellimetric (tm) automated essay scorer that is being used as part of a Fund for the Improvement of Postsecondary Education (FIPSE) project that uses technology to grade electronic portfolios. The purpose of the electronic portfolio is to demonstrate a mechanism for translating the general learning goal on writing in an operational way that permits the developmental tracking of students throughout their undergraduate curriculum. Moreover, the technology can be readily incorporated into any course in which writing is a significant component. (Contains 22 references.) (Author/SLD) Note:The following two links are not-applicable for text-based browsers or screen-reading software. Show Hide Full Abstract
ERIC Full Text (468K)