Contextual note: this post is one of several on the memorability debate.
The process of reviewing papers is relevant in this memorability discussion, since the Few15 critique specifically called into question whether the peer review process at InfoVis yields appropriate quality.
Papers as a Mix of Strengths and Weaknesses
No paper is perfect, any paper has a mix of strengths and weaknesses. The job of the reviewers is to decide whether the strengths outweigh the weaknesses, and it is valid for two reasonable scientists to disagree on these given that it is an individual judgement call. That is, all papers have flaws; the judgement of the reviewer is to decide is whether these flaws are fatal. Few argues that the Mem13 paper and the Mem15 paper have fatal flaws. I disagree with this assessment, and I explain why at length
below [Update: in a subsequent post].
Peer Review and the Conversation of Science
I’ll echo and expand on the words of two other InfoVis steering committee members that a conference is a conversation (Fekete), and that science is a conversation (Heer). The review process is an intrinsic part of that conversation, even though much of it is not visible to the readers of the final draft of the paper.
Papers are the major units of speech in the scientific conversation. Papers cite and discuss past work, and frame their new contributions with respect to the limitations of that past work. Typically the way somebody argues against the conclusions drawn a paper is to write another paper that carefully shows why that original one didn’t get the story right. The strength and validity of of that argument is judged in the peer review process, where frequently the reviewers are the authors of the very papers that the new paper is characterizing as having limitations. It’s not usually quite so simplistic as just saying the old work is flat-out wrong (although that sometimes does occur). It’s often a matter of noting situations where it falls short, or extending to new situations not previously considered, or proposing the existence of new confounding factors that serve illuminate a previously murky assumption or explanation.
Like most practitioners, Few doesn’t take part in that academic conversation as an author. That’s not surprising – if he did we’d normally call him an academic, since that choice to engage in publishing research is exactly the dividing line between those categories.
Unlike many practitioners, Few has engaged with scientific papers at a sufficiently detailed level that he has been asked to take part in that conversation as a reviewer. He has chosen to decline the most recent invitation because of his belief that anonymous peer review is implicitly unethical.
While of course Few is free to make his own choices in this situation, since they affect only himself, I strongly disagree with the assertion that anonymous reviews are fatally flawed. Anonymous reviews provide the opportunity for honesty of assessment without fear of future retribution or retaliation. It’s a structural check against the problem that papers could be rejected from grudges rather than from merit. It’s also a protection for junior people being able to honestly assess the work of senior people without the fear of such retaliation as unenthusiastic letters when tenure time rolls around. Neither of these situations is a problem for Few personally, since he doesn’t submit papers or want tenure, but they are very real concerns for academics.
In the comment thread, Few expresses concern that anonymity supports irresponsible or incompetent behavior “in the shadows”. The fact that he isn’t acknowledging is that there is indeed considerable and significant oversight in the review process that happens at multiple levels. Reviewer identity is only anonymous *to the authors*. It is not at all anonymous to the other members of the program committee or the papers chairs!
First, there’s a two-tier reviewing system, where the (primary and secondary) reviewers who are on program committee have positions of higher responsibility than the external reviewers that they invite. These program committee members are carefully chosen based on the quality of the reviews they have written in the past.
The primary reviewer exercises judgement about the competence and thoughtfulness of the other reviewers when writing up the meta-review. As Jeff Heer alluded to in his first and second comments, all four reviewers read what the others wrote, and then discuss – sometimes at length. I consider it a sign of strength, not a process problem, that reviewers can and do regularly disagree on the merits of a particular paper. Usually these discussions end with some level of agreement, where either an initially positive person gets convinced by arguments about flaws from a more negative reviewer that there is a problem, or vice versa – that a reviewer who champions the worth of a paper (despite inevitable imperfections) convinces the others that it should see the light of day. As a PC member, I most certainly notice if an external on that team does a poor or incoherent job of reviewing, and I make it a point to not invite them again (and would sound an alarm if I saw that another PC member tried to do that in the future for a paper where I was on board).
Second, there’s oversight from the three papers chairs, who read every single review. They explicitly note cases where there is a review quality problem. Program committee members whose review quality is too low — or who consistently invite unqualified externals who write low quality reviews — are not invited to participate in subsequent years. At this point the pool has been sufficiently carefully vetted that there’s only a few per year who are disinvited, and some years there’s no need to eliminate anybody. Moreover, if the papers chairs are concerned that they don’t have enough information to judge a particular paper, they may call in a “crash reviewer” to do an additional review with just a few days of turnaround time. I’ve asked for these a few times when I was papers chair, and I’ve done a few of these myself in later years.
It’s true that the strengths and weaknesses of anonymous review is an active issue of debate across many scientific communities, and visualization is no exception. While I think that it’s reasonable to discuss whether InfoVis should change the process, I believe that the stance that anonymity necessarily begets irresponsibility is overly simplistic. The strength of a single-blind reviewing system very much depends on process questions of how it is run, and I think InfoVis has an extremely robust and careful process. It yields higher quality results than most other communities that I’m aware of.
I may well write further about this question in some later blog post, but that’s enough for now.
Quality of Evaluation Papers at InfoVis
The bar for ‘publishable’ and ‘strong’ typically moves over time at most venues. I’m confident that it’s gone in the right direction at InfoVis for evaluation papers: quality has increased. In the early years of InfoVis, there were no controlled experiments at all. Then there were a few, and they were fairly weak. As there came to be more and more, the bar was gradually raised, where they needed to be stronger and stronger to get in. I believe we’re now in a place where most are strong, and a few are great. I don’t believe we’ll ever be in a place where everybody thinks every single paper that gets in is great, because there is so much variation in the judgement about what it means to be great. That’s true for any venue at all.
I’ll discuss the specific merits of the Mem13 and the Mem15 papers in an upcoming post.