Show Me Your Work

This post is mostly working through the challenges that Chat-GPT pose for a general-education TEFL writing instructor. As Stephen Marche noted, “nobody is prepared for how AI will transform academia.” Well, almost nobody. I kind of am, since I’ve been dealing with analogous issues for decades now. So anyway, these are my thoughts, for anyone interested in them. 

So, I had hoped that students in South Korea wouldn’t discover Chat-GPT for at least another semester. (Most took a year or two after the Westerners here to discover Facebook, after all.) However, apparently South Korea’s President did what countless terrible students are about to do: he had an AI did his homework. (That is, he had Chat-GPT draft a copy of an English speech.) This was reported in the news here, and it’s trickled down enough that Korean folks who are paying attention to the news now casually bring it up in conversation: my wife mentioned it in passing the other day, in fact. 

This is bad news. South Korea’s got a pretty serious problem with academic cheating—one that it only belatedly started taking seriously, and even that’s unevenly distributed—and this is one form of cheating that nobody’s actually had to grapple with yet. The reasons for the cheating issue are various, but I’d chalk it up to decades of laxity and the attitudes of older folks.

I should be clear: when I say “cheating” I don’t just mean “cheat sheets”: every system in the world has people gaming it, and to some degree, that gaming is perceived as invisible to some people. White collar criminals in the West (especially in Canada) tend to serve shorter sentences that people busted for selling pot, if they serve a sentence at all, and nobody seems to perceive the disparities between how working class individuals are taxed or affected by bankrupcty versus how rich people or big corporations are.

In South Korea (and, I gather, the rest of Northeast Asia to some degree), there’s a similar pattern of taking for granted people gaming official, standardized testing systems. This goes back centuries, but I’ll spare you the lecture and cut to the chase: a lot of the test-preparation industry is built around this kind of “invisible cheating,” to the point where gaming tests isn’t just the preferred way to take tests, it’s also for many people the only imaginable way to do it. 

An example is the TOEIC exam. Thjs is an exam for which test-prep offered by Koreans is vastly preferred over test-prep offered by native English speakers, for the reason that native English speakers see it as a language test, and conceptualize the process of preparing for a TOEIC exam as one of developing competency over familiar speaking and writing patterns and the acquisition of vocabulary. In contrast, a lot of Korean TOEIC instruction hinges on “tricks” that students can use to game the exam. A Westerner might tell students that the best way to acquire vocabulary is to encounter it in reading and listening, and to use it in speaking and writing, with the belief that understanding the word will help one to recognize when it is the missing word in a sentence. In contrast, a Korean TOEIC instructor will instead train students to diagram or analyze sentences to see what kind of word is missing, and to recognize the kinds of suffixes used to build specific forms of words (noun, verb, adjective, adverb), so that they can guess which of the words on a multiple choice list is likeliest to be the missing word in a sentence. 

There’s also the fact that Korean universities were, until recently, incredibly lax about things like plagiarism. A recently as 2012, a student told me that I was unusual in giving students an automatic F in cases of cheating. Most of her professors, she told me, just gave students a D (or even in some cases a C) and let it slide. This is hardly surprising: plagiarism was even more broadly tolerated—at least in middle-tier schools and lower—a generation or two ago, and some of the people who are now profs did it themselves. (We can see this in the number of people who’ve been outed for plagiarism in their long-ago dissertations over the years: the “cheated on their PhD dissertation scandal” is really a standard rise-and-fall trope in South Korea now.)

There’s also the fact that a lot of universities are private, and thus are for-profit, and that a vast majority of South Koreans pursue undergraduate degrees. This means that there’s a built-in incentive not to be too harsh. Universities have that incentive—now more than ever—because the size of the student population is shrinking, meaning tuition revenues are falling; profs have the same incentive, because most of the oversight they receive in terms of their work is through student evaluations feedback. Being “tough” on cheating has an inherent cost, specifically on job security for the prof, and on enrollment rates for the university, so there has been a longstanding tendency not to go too tough on it, even as the official rhetoric about plagiarism has gotten tougher. It’s not that profs necessarily see cases of cheating and ignore it, mind you; it’s more often that many will put a limited amount of time into pursuing investigation about it, and will refrain from confronting students about their suspicions when they think cheating has occurred. That’s why you might get a D for cheating, but not an F, in some profs’ classes. It’s easier and less trouble to do that than to track down instances of cheating, notate them, inform the student, and deal with the blowback… because, yes, students kicked out of a class for cheating are still given the opportunity to file teaching evaluations to their profs. (I’ve actually started noting cheating and not telling the student till after grades are posted, because of this, since after getting caught out, cheaters have a propensity to skew one’s instructor feedback pretty badly, which results in more paperwork in the end.)

All of which brings me to Chat-GPT. If you’ve living under a rock, you may not have heard, but AI can now passably generate text as prompted by a user. The results (so far) are a bit dry and boring, lacking in flair and color, though some clever prompting can help with that. I’ve been experimenting with Chat-GPT, and discovered that it can be useful for brainstorming stuff like writing topics, and whatever. This makes sense: the web is full of TEFL materials including lists of writing topics for students, and as Ted Chiang recently explained, “ChatGPT is a Blurry JPEG of the Web.” Just like image generators reproduce the biases of artists online, Chat-GPT reproduces the standard assumptions and experience of EFL instructors. You get a blurry image of whatever’s out there. 

The problem is that the TEFL teacher in Korea, at least when teaching a mandatory language course to undergrads in a university setting, has a large number of students who simultaneously are being forced against their will to take their course, don’t have the motivation required to improve their English, don’t have the time in class to actually do so, and who are accustomed to laxer treatment when they cheat. They also, at least at some universities, have a pretty good chance of being able to skew a professor’s teaching evaluations if they are caught and disciplined for cheating. It’s rare that someone who cheats long persists in insisting otherwise: when I show them a copy of the webpage containing their submitted work, they usually will admit quickly that they cheated. However, they still often object to a “too harsh” consequence for cheating, and there’s not much one can do about that except to withhold the grade until after teaching evaluations go through, I suppose. 

In this context, the prospect of detecting AI-generated text is… questionable. I like the idea, but it’s especially questionable whether it’ll work with writing by EFL students. Someone writing in a foreign language in which they’re not completely fluent, but in which they are relatively proficient, can produce something a lot like Chat-GPT output, especially if they use the basic spelling and grammar checker in their word processing software. (In fact, when I tested a few of the final exam texts submitted by my students in the class I taught over winter break—for which I have video evidence that it was written manually, not by Chat-GPT—I got a mix of results, ranging from “probably written by a human” to “probably generated by Chat-GPT.” To be clear, I have video of students developing these paragraphs over a period of two hours, from brainstorming and building template sentences to organizing, writing, and editing them. These are false positives. Apparently the internal limitations of at least some developing EFL writers—vocabulary, awkwardness, stilted prose—seem to trigger false positives on some of those AI-writing detector sites, and they’re therefor even more unreliable than they would be at assessing the work of a native speaker. (And that’s assuming a student is lazy enough to submit unedited Chat-GPT content, by the way. If I can produce a short page of newsprint in 30 minutes, I’m pretty sure a native English speaker can produce a full academic essay in a few hours, with time to spare for editing the text for a more human-like style.) 

I am lucky, because I spent a lot of time thinking about this many years ago. I landed at the fact that students needed to be taught a repeatable, reliable process for writing. The shape of the process that I teach has changed over the years, but my integration of process into assignments hasn’t: a student might be able to plagiarize a completed paragraph or essay from the web, but they won’t be able to plagiarize every step of the writing process for that text. They will, at the very least, have to reverse engineer stuff like brainstorming materials and structure-building tools out of the plagiarized materials, and since having students reverse engineer texts is one way I have them develop familiarity with them, this is kind of something I want them to learn how to do anyway. Besides, I’ve always had a pretty good nose for apparent plagiarism, and my familiarity with Chat-GPT output may have sensitized me to how it reads.

Maybe. The problem is that the bland, colorless text it tends to output is kind of the thing a lot of my students would love to be able to produce, and indeed some of them do produce writing like that on a good day. Writing in a foreign language is tough and this is a bigger achievement than most people can realize unless they have experience studying foreign languages themselves. 

Outside of writing classes, the solution is simple: move away from written homework. Oral examinations, Q&As, and the like will become more important. Students will be expected to demonstrate their skills. That kind of thing!

For writing classes, on the other hand, I think in the long run, Chat-GPT is going to proliferate, and written homework is going to become one of those things that instructors rely upon less in the future. When work needs  to be written, it’ll become necessary to have students show their steps… and as AI-generated writing becomes more refined, and starts offering the ability to develop a text according to a specific protocol—fake brainstorming and outlining and editing—I think process-documentation will become more and more invasive. For all of my classes, right now, I have students do their exams in a Zoom recording, with Screen Sharing turned on and sharing the entire desktop, with their webcam and microphone switched on, after rotating the camera to show that they only have one screen within reach, and after saying a few words. It’s not completely fool-proof, I suppose, but it’s a pretty solid barrier to easy cheating. The audio lets me confirm the keystrokes, the webcam allows me to confirm the identity of the student doing the writing, and the screen sharing lets me be sure they’re not using machine translation software or an AI to cheat. 

This is a massive pain in the ass, though. It makes sense for courses where writing is the focus and where the students are uniformly motivated to improve their writing. That is to say, not your typical general education TEFL course. 

Oh, by the way: all of this is an unfortunate consequence of one thing: numerical grades. If we got rid of grades, we would remove the impetus to cheat for many students—not all, but many. Chat-GPT would actually be an excellent tool for students looking to improve their writing, as I discovered the other day:

If I’m honest, that’s not too different from the feedback I provide students on their writing, except that it’s explained in greater detail than I have time to do, and it’s written in slightly simpler terms. It also avoids the kind of “challenging the ideas” stuff that I have to fight myself not to get too deep into. Well, and that one of the bits of feedback directly contradicts the advice I’d give (#2). Then other difference is that it was produced at much less effort, more quickly. 

So… this is something I can see happening on the professors’ side of things: profs who are overscheduled, and faced with students who aren’t all that interested in getting feedback, leaning on Chat-GPT to generate some basic grammar and style pointers. Of course, that’s only possible (in Chat-GPT, right now) for paragraph-length texts, but soon we’ll be able to drop entire essays into an interface for remedial writing feedback. And if we can, so can students. This, in other words, is a way I can see Chat-GPT being useful to EFL learners.  

But it’s going to be disruptive in classrooms, to be sure, especially if we hang onto the idea of hierarchically sorting students by grades. Oh, how I wish that Chat-GPT would help us kill that idea. But it won’t, I don’t think. Instead, it’s just going to kill the essay. And… maybe that’s okay, but I suspect not. Writing helps people learn to think systematically and to express their ideas in ways that are easier to assess, critique, and debunk. Eliminating that is pretty problematic. 

But maybe Chat-GPT will instead just kill off the low-hanging fruit of academic training, just as it will kill off other low-hanging fruit? Er, but what I mean by “kill” and “low-hanging fruit” is, I guess, a whole ‘nother can of worms, for another day. 

Comments

  1. Chris Azure says:

    Hey Gord, love this post. It’s something we’ve been discussing at our school a lot right now, and I’ll be sharing this with my manager (who’s the one I’ve been having conversations with about it). We’re leaning into more handwritten stuff during class, but that’s only possible if you have the schedule to do it, and your other suggestions are great too (for now, as you said).

    Hope all is excellent with you and the family btw!

    1. gordsellar says:

      Hi Chris! It’s great to hear from you!

      Yeah, Chat-GPT (and whatever other forms of AI writing tools are coming) are going to be a real mess in the classroom, and especially seem to be a great way to kill traditional forms of homework. Handwritten is certainly one solution, as long as it’s handwritten in the classroom.

      I rely heavily on process-oriented writing instruction anyway, so for me it’s easy to ask them to show their work. I don’t know if that approach will work as well in the future: apparently we’re shifting to half the instruction hours a couple of years from now, and it’ll be a lot more of a struggle to cover the entire writing process. (Then again, I have a huge bank of prerecorded lecture materials from the pandemic that I can use to supplement, even if I only use the stuff that walks students through all the grammar and structure stuff in the textbook, flipped classroom style.) I do think that’s the other component of a solution: flipped classrooms, with students working on their writing in class. That said, the need to edit and revise is a cornerstone of writing instruction for me, and I have no idea how to get students to do that in the classroom. It takes time and, for many, quiet reflection. Peer feedback is nice, but one needs to chip away at the work alone, too… and having students submit videos for everything is pretty unwieldy. (I skim the videos I get with their exams, but even just twice a semester, it’s not like I have the time to watch them all the way through. Maybe if there were automated detection methods, but… that’s a bit pie-in-the-sky right now.)

      We’re all doing well! Our son got his yellow belt in Tae Kwon Do and is a certified LEGO-maniac these days, so it’s good we found him an affordable place to go build stuff and leave it there. Jihyun’s busy with work and a couple of classes (botanical art and photography), and I’m in an unusually productive state. I hope you and your family are all doing well too!

  2. Heath Graham says:

    Cheers for this Gord. We are grappling with this at my workplace too. It’s always interestintg to see another perspective!

    1. gordsellar says:

      No worries! Yeah, I think it’s going to take some time before preferred solutions start to emerge, but it’s gonna happen sooner or later.

Leave a Reply

Your email address will not be published. Required fields are marked *