The gift that keeps on giving
Sequencing small RNAs involves ligating adapters to the 5’ and 3’ ends of the RNA. This ligation is achieved using Phage-derived ligases. When we started sequencing small RNAs, in order to save money, we made a fateful decision to use barcodes on the end of the 5’ adapter that gets ligated. Upon sequencing the same sample with different barcodes, we noticed that the relative populations of miRNAs that we detected strongly depended on the barcode. This pointed to a big problem with the sample preparation protocols for deep sequencing. This protocol is used by all prominent labs in the field, and as novices to sequencing (we had just set up the sequencing core in our institute), we were doubly nervous about claiming the protocol had some problems.
After an exhaustive series of experiments, we identified ligase activity as the source of the bias, which then led us to think of methods to overcome this problem. The solution we hit upon was elegant in its simplicity; we replaced the ends of the adapters with random nucleotides, essentially creating a complex pool of all possible combinations of nucleotides at the ligating termini of the adapters. Our resulting paper explained the mystery of why deep sequencing, that was supposedly so accurate and sensitive, did not seem to agree with qPCR or microarray-based assays . In fact, some researchers had suggested that the sample type determined the appropriate measurement tool for miRNA profiling, and we were glad to have cleared up the confusion.
Little did we realize the travails that were going to be visited upon us. The editors of almost every journal that we approached deemed the material to be too specialized or esoteric, while continuing to publish groundbreaking work that involved small RNA profiling through deep sequencing using a flawed approach, as we had clearly demonstrated.
Amazingly, the same journals that had rejected our paper in 2011 as being too esoteric and specialized, started publishing papers that often either re-identified the problem or repeated our solution with some modifications, or else came up with solutions that were not as good. One was published in Genome Research, whose editor had rejected our initial report, judging it to be too specialized. There are several other examples of this, but the cake has to be taken by a recent paper in Genome Biology that does not advance the field in any significant way, while claiming a new method. Perhaps I should be happy that the impact factor of the journals republishing the bias we identified and solved has been growing, suggesting that people are now taking this seriously.
The submissions in this area seem to receive very little editorial or reviewer scrutiny, and prior art is conveniently ignored. There is a serious breakdown of the whole system of reviews and the gatekeeper role of editors who seem ill-equipped to handle the nature of science in the age of big data. The capriciousness in the editorial control of what gets sent out to review is not limited by the impact factor of the journal concerned.
Correctly handling such biases is important, with major implications for the biology inferred from data, and this raises serious doubts about the accuracy and validity of the large volumes of small RNA sequencing data that have been generated and placed in the public domain through projects such as modENCODE. Perhaps repeatedly re-discovering the problem, re-solving it, and re-publishing in higher impact factor journals might eventually lead to its acceptance by the high priests of the community.
1. M. Baker, “MicroRNA profiling: separating signal from noise,” Nat Meth, vol. 7, no. 9, pp. 687–692, 2010.