To rate a paper, its intended audience must be known.
I read scholarly papers from several fields. Let’s consider just two: computer graphics and education.
In computer graphics, a good paper is one that (a) clearly describes a (b) new algorithm that (c) makes better pictures than previous algorithms and (d) does so efficiently. That’s what everyone in the field wants: fellow researchers, practitioners, students, outside observers, etc.
In education, we can describe a parallel to set of criteria: (a) clearly describe an (b) educational practice that (c) improves student learning and (d) does so without negative side effects. But, perhaps counter-intuitively, that is not what everyone wants. It’s a good set of properties for a paper read by an educator, but an education researcher has different goals.
In education research, a good paper is one that (a) clearly describes a (b) specific implementation of an educational practice and (c) how it was measured, (d) how that measurement was analyzed, and (e) what remains unanswered. Note that improvement of student learning is not a criterion: to be trustworthy, education research must include publications that show a practice had no measurable impact.
But why this difference? Why would a paper in computer graphics showing that an algorithm doesn’t work be unpublishable but a paper in education showing that a teaching strategy doesn’t work be valuable? Because of the differing controllability of the fields.
Computer graphics is a field where almost allThere are exceptions to this “almost all” generalization, but they are fairly rare and unrelated to the point I am making here. it studies is entirely under the researcher’s control. If a particular algorithm successfully makes pictures of dripping wax once, it will continue to do so. Barring research dishonesty or gross error, all results are replicable. There’s nothing to control for, no hidden variables. Results speak for themselves.
Education is a field where almost nothing it studies is under the researcher’s control. Every classroom, every student, every day is different. There are hidden variables everywhere. Noise shows up in every measure, and almost always contains undiscovered internal structure that violates the independence assumptions that underlie many of the most common analysis techniques. One study, no matter how dramatic its results, is only a piece of the story. It takes many replications and variations by different researchers and in different situations with a consistent outcome before we can gain confidence that something actually works.
Education is also a field where “didn’t work” results must be published. The importance of this can be shown with a simple thought experiment. Suppose I were to start a rumor that clapping twice before each class helped students focus and improved learning, and that this rumor gained enough attention that 100 educators decided to test it by randomly picking half their classes to clap twice in and half not to do so. The vast majority of those researchers would find no impact, but five of them would randomly have picked a class that just happened to have more students who were doing well and see that the clapping class was in the 95th percentile of all classes they’d ever had. If those strong positive results were the only kind we published, as they are in more controlled fields like computer graphics, we’d end up with five independent papers all showing a strong positive result of this clapping exercise.
To summarize:
In some fields, like computer graphics, every paper should have a result a practitioner can use.
In some fields, like education, most papers should not have a result a practitioner can use.
The key difference between these fields is uncertainty: if there is noise or chance anywhere in the process, researchers and practitioners need different things.
These points lead to three practical observations.
First, when analyzing and reviewing papers in a field with uncertainty, it is important to know and keep in mind your audience. Simple phrases like “it’s a good paper with exciting results” mean very different things to researchers, who hear roughly “it’s helping us understand the field”; and to practitioners, who hear roughly “you should definitely do the thing it talks about if you can.”
Second, I worry about current trends that introduce noisy, uncertain data into fields that previously lacked it. Many areas in computing and engineering are doing this by starting to adopt methods based on big data“Big data” is unlike other data in that it has low quality (we take the data we have) and high quantity. Intuition learned in statistics classes about p-values, sample sizes, and controls are the wrong tools for big data. and machine learningCommon “machine learning” approaches are unlike other regression in that they do not produce a model or answer but instead an incomprehensibly-complicated function. Intuition learned in science classes about hypotheses, testing, truth, and understanding are the wrong tools for machine learning.. Are these fields adequately distinguishing between (a) good papers based on deterministic processes that definitively conclude “this works” and (b) good papers based on noisy data that cooperatively conclude “the data we have suggests this might work”?
While I am uncertain how well we are doing on the previous point, I am confident we are doing poorly on a third point: the broader world does not understand the difference between these two kinds of fieldsNor do they understand the difference between small and big data, nor between regression and machine learning.. Hearing this distinction even mentioned, let alone properly discussed, is so rare that it catches me by happy surprise every time. I think about the difference often, but even I rarely mention it because I don’t even know of good vocabulary to easily describe them.
Looking for comments…