Joe Cirio's Writing Assessment Reflection and Responses

In the world of writing assessment, the man with the most valid methods is king

Red Herrings

I want to focus this blog post on Automated Essay Scoring (AES) or any sort of computer-assisted scoring—specifically, I’m point out some interesting stuff from our readings.

The one chapter I read from Ericsson and Haswell’s book was from Ken McAllister and Ed White, “Interested Complicities: The Dialectic of Computer-Assisted Writing Assessment”. The chapter discusses the complicit agents in using computer-assisted assessment. The authors offer these stakeholders: the researchers, the entrepreneurs, the adopters, and the users. For each stakeholder, the authors point to the relationship and complicities that stakeholder has with other stakeholders, but also the consequences of those complicities.

What I find particularly interesting is that the authors are outlining how AES is the logical response by the complicit agents (entrepreneurs and researchers specifically) when the narrative from teachers, writing instruction, and students is that we need a less labor-intensive work load, cheaper labor, less-expensive tuition and to focus on “more important” issues than on grading. To say that AES is the logical response does not necessarily mean that this is the technology we’ve all been waiting for—rather, it may mean that the exigencies in composition that are perceived from entrepreneurs and researchers (people in ETS and AES researchers) are not being interpreted correctly. Or that we—as composition folk—may be projecting the wrong exigencies; or we are not focusing on the right exigencies to begin with. What I mean to say is: when entrepreneurs and researchers find that the appropriate and logical response to the concerns of comp teachers is AES, what does that say about the concerns we’re projecting or circulating across the discipline?

I think about Bill Condon’s article “Large-scale assessment, locally-developed measures, and automated scoring of essays: Fishing for red herrings?” He discusses how AES is a red herring: it is more a sign of a larger issue. The more productive discussions should be redirected toward the original questions about the construct of writing. Condon uses AES as a way to talk about large-scale assessment and holistic scoring; he claims that AES is not necessarily the problem, but it is a representation of a long standing issue in how assessment practices are constructing what writing is. In other words, when AES is being offered as a logical way to assess writing, then maybe something’s wrong—as I’ve pointed out with McAllister and White.

What is “negotiation?” (evaluating [digital] writing and programs blog)

The readings of this week (“evaluating [digital] writing and programs”) covers much of where my research interests lie–particularly my thesis. These readings focus on classroom assessment practices and how instructors construct what kind of writing will be assessed: do we use rubrics? grading contracts? portfolios? 

 I’d like to take this blog post to share (and even think through for myself) some of the ideas behind my thesis. My thesis specifically focuses on rubric in the classroom; specifically, I’m situating myself into a conversation about the best practices for rubrics. Authors such as Turley and Gallagher, Bob Broad, Valerie Balester, Inoue (not the piece we read from him, but in other work) advocate for assessment practices to be negotiated and discused with students so students will have a stake in how they will be assessed. However, I’m raising the question about whether our students are capable of negotiating assessment practices; I’m particularly interested in the negotiation of rubrics.

In Katai’s chapter that we read for class, she discusses how direct-self placement is meant to allow students to have a say about where they will be placed, but she warns that in many cases, the questionnaire (or placement guide) that students fill out are guiding them to be placed into remedial English courses.  The tools used in these situations are having students self-segregate themselves—or rather, the questionnaires are segregating students by race under the guise of student self-placement. The “directed” of “directed self-placement” is taking more of an influence than “self.” Having students think that they are placing themselves focuses the problem on the students instead of the placement tool.

Similarly, my thesis is focusing on how rubrics are actually not negotiated when instructs attempt to discuss criteria for a rubric, but rather, students are so standardized through schooling and standardized testing that they are incapable of creating rubrics that are meant to help them. In an instructors attempt to create a negotiated rubrics, are students actually offering criteria of their own? Or are they parroting back values that have been drilled into them? If the latter is true, how can we re-define what negotiation means? Is what we’re doing in class negotiation at all?

 These are the central questions of my thesis—I’m taking a look at the ecology of rubric making practices and widening the lens to account for schooling, testing, and outside experiences students have with writing. 

Being silenced in the sciences–issues of race “white-washed” from sciences

Being silenced in the sciences–issues of race “white-washed” from sciences

Assessment Podcast

For a stirring conversation about this week’s topics in writing assessment, go visit Jacob Craig‘s blog.

 

 

Responding to Assessment as Technology: My thought process

So, I’m working off of posts from David and Jacob. Both David and Jacob agree that technology is not ideologically neutral, but my question is whether technology is inherently ideological or is our use of that technology ideologically driven?

For me, answering this question is a way for me to understand whether assessment is a technology. Assessment, I feel, is an ideology–In Dr. Yancey’s piece, she’s tracing these waves, but I feel like those waves are ideologies themselves, with the practices and concepts affecting the the next wave.  An Assessment theory contains practices and systems that follow that assessment’s ideology.

So, now let’s talk about technology. Neal seems to answer my original question I had earlier:

If bridges and guns cannot be neutral but are rather ideological regardless of the ways people use them, writing assessments are not so different a technology as we mighty imagine. (22)

The key here is that he states “regardless of the ways people use them”–answering my question that it’s not just the use, but technologies are inherently ideological. David and Jacob both mentioned Dr. Yancey’s class so I might as well continue to make the reference. Like In Neal’s book, we talked about the use of guns as a technology that affects the way people act. I’m willing to agree, but before I do, I want to complicate this a bit.

In Yancey’s class, we talked about the phrase “guns don’t kill people, people kill people” which supports the argument that gun use makes guns ideological–they are not inherently ideological. We also talked about Twitter and texting language and how the use of these technolgies will begin to invade other aspects of writing and being to evolve our language–it is through these technologies that language evolves a certain way. That argument demonstrates that tech is inherently ideological.

So, I reference again a story from my time as a psychology major in undergrad. Whenever we talked about correlation vs causation (which we frequently did), we used two examples (one ridiculous and one less ridiculous): first, the number of domesticated cats increased as the the number of pirates decreased–they’re correlated, but obviously the one didn’t cause the other. The second example was with guns, the use of violent video games was directly correlated with violent crimes with guns. So we raised the question of whether the violent video games caused people to do violent acts (supporting the argument that technology is inherently ideological) or the people were already violent and were thus attracted to violent video games (supporting the argument of “guns don’t kill people”)

In psychology, we typically say that this is an often debated topic, but the statistic itself does not imply causation. I’m curious about the second possibility; I reflect on my own situation. I am not a gun person–I will never own a gun. There is no possibility that I will be attracted to a gun. But there are certain people who are attracted to guns. So, is it possible that guns (and other kinds of technology) attract a certain kind of clientele? Clientele that are already inclined to be affected a certain way with a gun?

Let’s think about writing assessment. Obviously, students will be forced into the kinds of assessments they’ll be participating in. But an institution is implementing that assessment and that institution has an agenda/ideology. A set of assessment practices is a representation of the overarching ideology of the institution  But then I’m thinking–an assessment practice is a technology, but when I use the word “assessment,” I’m including not just practices but also ideology, concepts, waves, etc. When I say technology, I think I’m exclusively talking about practices of a material and medium (intended or otherwise). What to you get when you strip a technology of its practices? Probably just a medium or even down to a material without a message. Are mediums and material, alone, technologies? I’m not prepared to say that.

So, what did I just say….

I’m conflicted: in one sense, I think my thinking demonstrates that technology is inherently ideological, but then I talk at the other end of my mouth and say that technology inherently contains uses which assessment does not necessarily need to contain.  So I want it both ways: Assessment is a technology particularly when assessment includes practices (either a physical practice or a method), but assessment may not be technology if we say technology does not include practices–which I say does.

Well, in that sense, maybe I don’t want it both ways. I’m saying assessment is technology.

Questions

This week’s readings remind me of a question my professor asked in a class I took in undergrad called The Psychology of Personality.  Before I say the question, I’ll give some background:

We were discussing what a personality test actually does—what is it testing? What can that test tell us about a person? How can the information from a personality test be used to predict behaviors in other contexts? (Psychology is all about predicting future behaviors). There are several personality tests with different degrees of success, and the way the test’s success was determined, we sought to determine whether the tests were reliable or valid. (Now, my understanding of reliability and validity has been grounded in psychometrics since I was a senior in high school—I have a degree in psychology so this is no surprise.  But I’ll talk about this another time) Some of the tests that we talked about (from memory) were the Myers-Briggs, the True Colors test, and the Personality and Preference Inventory to name a few. From what I remember, we had determined that the Myers-Briggs was not a successful test; the scores would change as the participants were in different situations, and the test’s constructs were too reductive to create any useful information.  Personality tests began to evolve after each use and eventually lead to some pretty successful tests that were reliable and valid (in terms of psychometrics).

At other points in the semester, my professor would explain what kind of personalities correlated to what kinds of behaviors—people with this kind of personality traits will most likely be doing this kind of thing.  We talked about the SAT as an unsuccessful predictor of college success, but how research had supported the claim that individuals with certain personalities as determined by a personality test was a better predictor of who will succeed in college.  So, here was my professors question (not verbatim, but I’ll give the essence):

Based on the research, should we replace the SAT with a personality test to help college admissions decide who is accepted?

This was a difficult question for me to answer: instinct told me absolutely not, but the logic of testing and assessment told me that this was the reasonable next step.  So, what’s the problem here? Or rather, what kind of other questions does this raise for validity and reliability? About the logic of testing? About how information of tests should be used? Is this a reasonable way to use this information—if the methods are valid and reliable, is it ethical?  But then again, why do we have a personality test to begin with? Is it solely to understand what type of personality we have or is the information to be used for other contexts?

Let’s look at a writing test. I ask myself this: why do we give writing tests? Or more specifically, do we have a reason to give a writing test to test only if the test-taker is a successful writer? Maybe, but most often a writing test is used to use the score for other purposes. For example, for the AP test, the writing portion may be used to determine whether the test-taker should received credits for an AP Psychology test—it’s not testing for good writing. Or is it? Is the test, in fact, testing for good writing, but the resulting information correlates to literacy in Psychology?

I don’t really have definitive answers to any of this, but these are the kinds of questions I’m asking.  What kind of information is a writing test really telling us?

Framing my lenses

In Dr. Neal and Huot’s techno-history, I particularly took an interest in the section on technology as ideological–I’m sure that comes as no surprise. Much of my research is focused on the malleability of writing assessment technologies, but specifically on how deep this influence goes. I ask myself a few questions: are our writing tests supposed to record a snapshot of writing ability of the test-taker at that moment? Or is the test, itself, influencing the way students are writing?  As Huot and others have said,  the test, indeed, has an influence on the make-up of curriculum in the classroom.  What do we, as teachers and scholars, come to terms with this inevitibility?  Is it inevitable that the test affects the curriculum?  I think this question and others I’ve brought up have been some of the central questions of writing assessment: how much control does the test have over the education process? Dr. Neal and Huot reference Elana Shohamy from The Power of Tests; A Critical Perspective on the Uses of Tests (a book that I think is next on my list of things to read) and explain that she “is quick to point out that while scholarship in testing looks at tests as objects that need to be technically sound, especially in terms of validity and reliability, the real power of testing resides in its use to control the educational process” (422).  This echoes my own concerns over the power of the test, but the control on the education process is complex–I have further questions: who  has the power? Who is in control of validity and reliability of writing tests? And what scholarship are they drawing upon to make these tests–what influences are privileged? With that, I’m drawn toward Behizadeh and Engelhard’s history of writing assessment through three lenses: measurement theories, writing theories, and writing assessment practices in the 20th century.  The authors demonstrate the need for more communication between theorists and practitioners, between writing theorists and measurement theorists, and between teachers and test-makers–this ideal communication is something that is not seen much in the history they express. Oftentimes, the authors explain,  there is a lean toward focusing on measurement theory when creating writing assessment practices which creates writing assessment practices that may be reliable and consistent, but invalid in the ways students are being tested in their sociocultural contexts.

This is the lens I plan to see the content of this course with: looking at the history, theory, and practices in writing assessment through the ideological motives that the power structure attempts to preserve and emphasize.  How can we create writing assessment practices that are equitable? When an assessment practices is used or a theory is privileged, who gets left behind? And also, how can writing theorist have an influence in the ways we test student writing? These are big questions, but hopefully I’ll be able to chip away at it throughout the semester.