Rankings and Ratings for Beer: Lessons from the Classroom, and Lessons On the Classroom

I posted earlier today on Facebook regarding the topic of ratings collection design systems for beer ratings websites like BrewAdvocate (below, referred to as “BA”), based on my experience designing grading rubrics for teaching and student evaluation purposes. I was posting in the context of others arguing that the ratings systems on various enthusiast rating websites were relatively less meaningful in the fine gradations; that a been with a 97% quality rating may not be “better” than a beer of a 95% rating. I responded as follows:

Yeah, also because people’s rankings are affected by others’ rankings, and by reputation. I can’t remember whether the review/ranking page sits with other rankings available for viewing, but that’s your first sign of an influence leakage. (IIRC BA is that way.) It’d be better if ratings were invisible on the review/ranking page, until after you’ve posted your own. But people probably wouldn’t like it.

Also, the percentage is a ridiculously fine-grained scale for this kind of evaluation. You learn that, teaching. You learn that a grade out of 100 is often too distracting and fine-grained for a student to handle, but also allows too much subjectivity in the evaluation on your side, as a teacher. A+ to F is handy because it outlines very specific value levels, and problem ranges, and you can look at an essay or assignment and clearly slot it into one of those given the quality of the work, as well as it being directly communicative to the student about how well the work met expectations.

Or, you know, there’s pass-fail, or like we had for our public speaking grad requirement: High Pass/Low Pass/Fail. That’s not fine-grained enough for beer evaluations. But it’s better than a 100-point ranking system.

For beer, I think a ten point scale would be more sensible. You’d find people reserving 10 for the life-changing beers, using 9 and 8 for solid, good beers, 7-6 for okay ones, and 5 and below for the crap. It’d be communicative, and less susceptible to the problems.

But of course, we’re talking about the usefulness of the rankings in terms of gauging feedback. Meanwhile, those sites thrive on more social uses of ranking, especially the identity of those doing the ranking. The unnecessarily fine gradations of rankings aren’t useful for beer rating, but they are for people who want to feel like experts and show off what they’ve gotten to tasting.

I would submit this is also precisely why the percentage system is so deeply embedded in formalized education, though it’s a real irony: 

  • percentage grading systems are flawed in an overabundance of fine gradations: in practice most instructors often use just a a system with between 13 and 25 points, often only the 13 (unless they’re shaving hairs closely enough to get into fractional letter grades like A+/A, or B-/C+, something they usually only do when it’s a question of competitive scoring, or when a grade is unresolvably borderline)
  • percentage grades are profoundly uncommunicative: they are rife with enigma, and students, instead of converting the grade back to the 13-point (or 25-point) system for clarity, tend to get hung up on where this or that 1% or 2% went missing.

The competent teacher makes a habit of  using a grading rubric for which he or she can outline the specific qualities and expectations of each level in some detail, or at least with a compellingly descriptive impression of them. But how many of your teachers stood up on the first day of class and said (or provided a handout that explained), “This is what an A+ paper looks like. This is what a A paper looks like… the difference is…” for each gradation level of the standard grading rubric in the class?

The second bullet point above–the enigma–I would argue is the direct result of the first bullet point, the overabundance of fine gradations. It is also the purpose of the system, for multiple reasons: more enigma means a better-covered ass in practice; more enigma means students are more distracted from learning (and focused on the arbitrary, unnecessary comparative scoring system); more enigma means a greater likelihood of buying into the idea that performing well intellectually is something mysterious and unreachable for most people.

And more enigma has a profound psychological effect on those who do well within the system: the inexplicable 3% one got that one’s peer didn’t? It must just be inborn intelligence or something. 

If you tell someone the pot roast is a 97%, they wonder: why not 98%, 99%, or 100%? The answer is: mood, inclination, the mood lighting. A+ is better. Even better is the natural gradation we use in language:

  1. Absolutely Amazing/Heartstopping/Life-changing/Incredible
  2. Great/Wonderful/Excellent
  3. Good/Nice
  4. Okay/Pretty good/Alright/Fine/Not bad
  5. Pretty bad/Not good/Bad
  6. Horrible /Terrible/Nasty/Awful
  7. Unimaginably Awful

And in actual practice, we often leave out the last two points of that scale for the purposes of tact and sparing someone’s feelings. So: a 5-point scale, only one of which is negative, and the negative is ambiguous and covers several levels of bad without being specific. This is especially true

Looks a lot like the A/B/C/D/F grade ranking system, if you ask me.

And now that I look at BeerAdvocate’s beer ratings system, I notice: they actually only use percentages to present the averaged result of all ratings entered into the system: the actual reviewers are entering only a 5-point rating on a series of aspects of the beer, and their overall rating (derived from each) is presented on a five-point scale. I would guess the site admins could probably argue that they’re only presenting the averaged out ratings in percentiles to reflect the fine-tuned differences in ratings. And I feel pretty secure that moving from that system to a aggregate system with a much smaller set of points would cost absolutely no clarity in rankings (since, anyway, people are informally, in reading the site, translating the percentile rankings into “zones” of quality) but would eliminate problems in bias based on the expense or rarity of a given brew.

