The reproducibility crisis has hit psychology hard. In writing Engaged: Designing for Behavior Change, I found myself having to double check whether some of the studies I learned about previously are still considered valid to cite. Part of the book writing process was a technical review, in which I asked five experts to read the manuscript and offer feedback about the accuracy and completeness of the information therein. From that feedback I went through another round of re-review of the information I’d included.
I removed references if I either found conclusive evidence that the original was not valid, or there was thin evidence that it might be valid. If it was a borderline case–for example, a replication had failed, but there was other evidence that the original phenomenon existed–I’d rework the wording to more accurately reflect the current state. In a few cases, I determined that it was ok to leave the reference in. And of course I noted in the book and will restate here: Any errors in including research that is questionable are mine, and not my reviewers’.
In my estimation, there are four basic reasons why psych studies fail to replicate and get scuttled from the evidence base. I’m most interested in the fourth as a problem to wrangle. The reasons are:
Malfeasance on the part of the researchers. Unfortunately there have been a few high profile cases of researchers who have deliberately misrepresented their results or otherwise been dishonest about their work. These studies essentially disappear from the literature once detected. The solution to the problem is to detect and banish bad actors.
Statistical game playing. Given the preference of journals to publish positive (as contrasted with null) results, and the emphasis on publication in job success, some skilled researchers play statistical games that misleadingly make their results seem more impressive than they are. The solution to the problem includes study pre-registration in which the planned analyses are detailed, as well as peer review by people who understand stats well. Personally I’d also love to see more willingness to publish null results, as they can also provide valuable knowledge.
Ambiguous or incomplete information about methodologies. If the original published paper doesn’t contain enough information for subsequent researchers to replicate the process, then a failure to replicate could be due to methodological differences. The solution to the problem again includes pre-registration of detailed study methodologies. Some journals have also experimented with digital badges next to the names of authors who have committed to transparency about their methods, so there’s some social pressure.
The study is no longer possible to replicate. Some studies examine fleeting or highly context-bound phenomena, so reproducing them at a later date is impossible. One example is looking at behaviors and reactions following a news event, such as the 9/11 terrorist attacks in the United States (see this special issue of the Journal of Management Inquiry).
Another is studies that look at populations that no longer exist in the way they did at the time of the first study. A lot of the work I did early in my career and a huge area of interest for me was how interacting online might affect behavior and experience. I researched how email addresses that include gender or race signifiers might affect others’ perceptions of someone, a hot topic because universities were newly assigning students email addresses that usually blended first and last times. I read books and papers by Sherry Turkle and Sara Kiesler about how being online might affect people’s relationships, behaviors, and mood. I took a very dark and not very fruitful detour into postmodern literature about what virtual worlds mean for human identity.
One thing much of this research had in common is that we sought to recruit technologically naive participants. We didn’t want digital natives for our studies; digital natives didn’t even really exist in the late 1990s and early 2000s. We were studying a specific moment in time when people who’d spent their formative years without internet were transitioning online.
I don’t know of any particular replication issues from the online behavior world that once fascinated me, but it’s occurred to me that we will probably never be able to capture those findings again. Of course there are still people today who aren’t online, but they aren’t the same people who were part of the research studies twenty years ago. They almost certainly differ in substantial ways due to socioeconomic status or culture, and they’d be going online for the first time into a world that’s extremely different from what once existed.
It’s occurred to me that user experience research is a little bit about a unique moment in time as well, by necessity. The studies I run today look at a particular type of person who either has or might in the future interacted with a specific product, service, or problem area. Often context is explicitly wrapped into the research: What do Medicare members in California want when it comes to using tools for wellness activities? What do construction workers need on the job site to help avoid serious injury? Of course this type of research exists to inform product and service development, not to contribute to the body of academic knowledge. It makes me wonder if perhaps there needs to be another designation within academic research for the highly context-bound studies, so it’s known they are rigorous snapshots of a moment in time and may not endure.
(I forgot I had written about replication in 2015. I was a bit more defensive then!)