These constraints were proposed by Hair, et al. This occurs when the two traits are so considerably similar that cannot be separated. Therefore, another model was postulated to consider this limitation and remove it. Vocabulary and grammar proved to be the elements of one measuring criterion, yet the statistical separability of AIE and CQ was not established. Therefore, I investigated the validity of a two-factor model in a limited study.
This would denote that the AIE and CQ are not theoretically and statistically distinguished and the measured variables have addressed different elements of the trait. Next, 60 texts were randomly selected to score. Due to time and budget limitations, I managed to recruit only one of the assistants teacher 1 to help rescoring the texts. At the end, there were eight measured variables in the new model.
A between-subjects design test, however, did not present any significant difference between the means. Then, the following model was generated based on the rescored manuscripts, which is displayed in Figure 2.
Figure 2. There is also a moderate correlation between the latent traits 0. This study set out to investigate the validity of a writing model. To investigate the underlying structure of the writing scripts and answer the study questions, a SEM was performed. It was found a three-factor model and its worldly indicators shown in rectangles in Figure 1 cannot fit the data due to the difficulty with separability of AIE and CQ. This is in part due to the low discriminant validity of the model.
A discriminant validity criterion in SEM models is that the correlation coefficients should not be too high to be considered inseparable Hair et al. Excessive correlation coefficients jeopardize the discriminant validity Brown, and therefore the model does not capture any discriminability Kane, M1 and M2 failed to show good features of discriminability in terms of their traits.
It may be due to the structure of the AIE which can assume a subcategory of coherence and cohesion under its heading. For example, to arrange ideas, information, and examples, it is necessary to use cohesive devices to make the movement within and through sentences of a text smooth. Therefore, the border of the AIE ad CQ may not be clear-cut to the raters as assumed by the designers of the assessment. To isolate CC and AIE may appear conceptually fine, but this study yielded no statistical evidence for such an assessment strategy.
A statistical solution offered was to manufacture theory-couched parcels by aggregating scores of the AIE and CC that had correlation coefficients greater than unity Widaman, Building parcels is an acceptable practice if we rely on the pragmatic philosophy of science, which holds representing each cause of variance especially minor causes in scores is impossible. This is difficult in social sciences and troublesome in language assessment where the range of skills in performance is very extensive but their separability and measurability may be neither desirable nor possible.
Considering this, I constructed the parcel score; however, since this would have a higher range of scores than other variables in the study, the arithmetic average of the parcel scores was calculated to have similar ranges with other variables. Research shows that using grammar or vocabulary as a criterion in writing can produce constant results Banerjee et al.
Therefore, in analytical rating in L2 writing, it seems theoretically and statistically plausible to rate two major areas of the scripts: sentence structure and vocabulary and arrangement of ideas and examples including the cohesion and coherence of the text.
As the present study showed, this strategy can explain a significant amount of variance in scores and ease the process of scoring and decision making.
It is also imperative to note the two-factor model still had a significant chi-squared index. There are different viewpoints how to interpret this index. Kline said of such observations:. There are two problems with the chi square statistic as a fit index. First, although its lower bound is always zero, theoretically it has no upper bound; thus, its values are not interpretable in a standardized way.
Second, it is very sensitive to sample size. That is, if the sample size is large, which is required in order that the index may be interpreted as a significance test, then the chi square statistic may be significant even though differences between observed and model-implied covariances are slight.
Schumacker and Lomax , p. Nevertheless, more recently, McIntosh and Barrett argued that if the chi-squared value shows the failure of the model, the approximate fit indexes should be banned. Therefore, for a more in-depth analysis of the findings from this study, the use of a larger sample size and integrated writing criteria which divide the underlying construct into two major parts is deemed useful.
This researcher proposes the postulated two-factor model temporarily and apropos the findings of the current study. Last but not least, analytical scoring has long proved helpful, well established, and precise Banerjee et al.
The issue of statistical and psychometric separability of all proposed criteria is of a paramount importance in investigations into the construct validity of the proposed models.
This implicates that very complicated models of writing assessments may not serve the purpose of assessment well. Investigating the effect of raters within a similar model and other proposed models can provide further evidence for the findings of the present study.
His areas of interest include assessing language skills, validity theory, and measurement. Ahour, T. Analytic assessment of writing: Diagnosing areas of strength and weakness in the writing of TESL undergraduate students. Iranian Journal of Language Studies, 3 2 , Aryadoust, S. Validity arguments in the context of high-stakes tests of second language listening: A quantitative and qualitative study. Unpublished confirmation report. Tehran: Jungle Publication.
Archibald, A. International Journal of English Studies, 1 2 , Astika, G. RELC Journal, 24 1 , Bachman, L. Fundamental considerations in language testing. Oxford: Oxford University Press. Ballard, B. Assessment by misconception: Cultural influences and intellectual traditions. Hamp-Lyons Ed. Banerjee, J. Barkaoui, K. Rating scale impact on EFL essay marking: A mixed-method study. Assessing Writing, 12, Barrett, P. Structural equation modeling: Adjudging model fit. Personality and Individual Differences, 42 5 , — Brown, J.
A categorical instrument for scoring second language writing skills. Language Learning, 34 4 , University of Cambridge Local Examinations Syndicate. Cambridge: Cambridge University Press. Connor, U. Coxhead, A. Preparing writing teachers to teach the vocabulary and grammar of academic prose. Journal of Second Language Writing, 16 , — Daiker, D. Sentence combining and syntactic maturity in freshman English. College Composition and Communication, 19 1 , Evola, J.
Discrete point versus global scoring for cohesive devices. Perkins Eds. Rowley, MA: Newbery House. Ferris, D. Treatment of error in second language student writing. Ann Arbor: University of Michigan Press. Fornell, C. Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 48 , 39— Gamaroff, R. Rater reliability in language assessment: The bug of all bears. System, 28 1 , Geldhof, G.
Hair, J. F, Jr. Multivariate analysis. Hamp-Lyons, L. Basic concepts. Homburg, T. Holistic evaluation of ESL composition: Can it be validated objectively? Jacobs, H. Testing ESL composition: A practical approach.
Jakeman, V. Kane, M. Brennan Ed. Kline, R. Principles and practices of structural equation modeling. New York, NY: Guilford. Coursebooks and writing manuals often include examples of model texts. When someone is drafting a text they are writing a text for the first time. It is not yet a final version. If a text is a draft, this means that the writer intends to go back to it to revise it or correct it, with the intention of improving it. A first draft may be very different from a final text.
After a writer has drafted a text, they may want to read through it carefully in order to think about how it could be made better. Another useful way to get feedback about a text is to give it to someone else to read and comment on. This is known as peer evaluation. In the phrase peer evaluation , a peer means another student in the same class or, for example, another learner of the same age or ability level.
A teacher or parent or higher level student is not a peer. As a result of carefully reading their own first draft or after receiving peer evaluation on it, a writer may decide to go back to write a new improved version of their text. This is called re-drafting. If a text is a re-draft, the writer is not just making minor changes or correcting a few small errors, but is making some more substantial improvements to the writing, for example improving its organisation and coherence.
A writer might redraft their text a number of times before getting to the final version. All of the example questions are taken from real TKT sample exams but have been shortened. Share This Paper. Background Citations. Methods Citations. Results Citations. Figures, Tables, and Topics from this paper.
Citation Type. Has PDF. Publication Type. More Filters. View 2 excerpts, cites background and methods. This study investigated the accuracy of scores assigned by self-, peer-, and teacher assessors over time. Thirty-three English majors who were taking paragraph development course at Vali-e-Asr … Expand.
View 2 excerpts, cites methods. The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution. Language Testing in Asia. The present study attempted to to investigate factors which affect EFL writing scores through using generalizability theory G-theory. To this purpose, one hundred and twenty students participated … Expand.
This paper briefly reviews the literature on writing skill in second language.
0コメント