NJU rloopbase
Q4. How many R-loop mapping datasets were processed in R-loopBase and how were they processed?

In current release of R-loopBase, we integrated 152 datasets generated in human cells by 11 different technologies (Table 1, for meta information please refer to Download). We next developed a comprehensive work-flow for quality control and data analysis. Briefly, technical replicates if existed were merged first, and raw sequencing data were then mapped to the human genome (hg38) using Bowtie2 local alignment mode. Uniquely-mapped non-redundant reads were kept as useful reads and samples with >7M useful reads were considered as with sufficient read counts. To maximally leverage the sequencing data, biological replicates with <7M useful reads were merged to meet with the minimal reads count cutoff as long as they were highly correlated (Spearman correlation coefficient >0.5). Finally, peak calling was done with MACS2 for all useful reads (DRIP-seq, DRIVE-seq, MapR and R-loop CUT&Tag) or useful reads from Watson or Crick strand separately (DRIPc-seq, RDIP-seq, ssDRIP-seq, qDRIP-seq, R-ChIP and RR-ChIP), using q-value cutoff 0.01 for narrow peak (R-ChIP and R-loop CUT&Tag) and 0.05 for broad peak (DRIP-seq, DRIPc-seq, RDIP-seq, ssDRIP-seq, qDRIP-seq, DRIVE-seq, MapR and RR-ChIP). If multiple biological replicates existed, peaks with ≥50bp overlap among ≥2 replicates were merged and taken as reproducible peaks. Samples with <100 peaks called were discarded. Following ENCODE guidelines for ChIP-seq data analyses, we further calculated signal portion of tags (SPOT) and reads in blacklisted regions (RiBL) as part of quality control matrix for users' reference. Only peaks outside of ChIP-seq blacklisted regions were used for downstream analysis. Specially, bisDRIP-seq data are not readily for peak calling, we instead uploaded their processed signal tracks onto our genome browser for visualization and comparison with other R-loop mapping data. In total, 132 datasets for 26 human cells generated by 11 different technologies have been included in current release of R-loopBase (Table 1).

Table 1. Meta information for R-loop mapping data
Technology Treatment Biological Samples Datasets PMID
DRIP-seq Control B-cell (1/1*), CHLA10 (1/1), EWS502(1/1), HeLa (4/4), HEK293 (2/2), SHSY5Y (2/2), TC32 (1/1), Stromal (4/4), Basal-epithelial (4/4), Luminal-progenitor (4/4), Mature-luminal-epithelial (4/4), MCF-7 (1/1), NT2 (6/6), K562 (2/1), Primary-fibroblast (2/2), U2OS (8/7), U87 (2/2), Jurkat (2/0), T-cells (2/0), IMR-90 (1/0), HEK293T (1/0) 55/47 30108179, 32669707, 32769985, 28802045, 30060749, 28270613, 28649985, 27552054, 27373332, 23868195, 22387027, 26182405, 32747416, 30591567, 29416069, 29416038, 32439635, 32398827, 32686621, 28341774, 32615088
Knock down U2OS (8/6) , U87 (2/2), HeLa (4/4), HEK293 (2/2), SHSY5Y (2/2) 18/16 32747416, 32669707, 32686621, 32769985, 30060749, 28270613
RDIP-seq Control HeLa (2/2), IMR-90 (1/1), HEK293T (1/0) 4/3 30449723, 26579211
Knock down HeLa (2/2) 2/2 30449723
DRIPc-seq Control K562 (2/2), HEK293 (2/2), NT2 (2/2) 6/6 32439635, 30060749, 27373332
Knock down K562 (2/2), HEK293 (2/2) 4/4 32439635, 30060749
ssDIP-seq Control HeLa (3/3), hVECs (2/2), hESCs (2/2), hiPSCs (2/2), hMSCs (2/2), hNSCs (2/2), hVSMCs (2/2) 15/15 31606733, 32640435
Knock down HeLa (3/3) 3/3 31606733
bisDRIP-seq Control MCF-7 (13/13) 13/13 29072160
qDRIP-seq Control HeLa (3/2) 3/2 32544226
DRIVE-seq Control NT2 (1/1) 1/1 22387027
R-ChIP Control HEK293T (5/5), K562 (2/2), HeLa (1/0) 8/7 29104020 32966794
RR-ChIP Control HeLa (2/2) 2/2 31679819
MapR Control HEK293 (3/3), U87T (2/2) 5/5 31665646
R-loop CUT&Tag Control HEK293T (6/6) 6/6 33597247
SUM - - 145/132 -
*number of datasets analyzed / high-quality datasets.

In current version, we did not include R-loop mapping data for other species besides human, mainly because of the small amount of datasets done by very limited number of different technologies, which prevents us to define R-loop zones of different confidence levels as we did for human cells (please refer to Q5).