In current release of R-loopBase, we integrated 152 datasets generated in human cells by 11 different technologies (Table 1, for meta information please refer to Download). We next developed a comprehensive work-flow for quality control and data analysis. Briefly, technical replicates if existed were merged first, and raw sequencing data were then mapped to the human genome (hg38) using Bowtie2 local alignment mode. Uniquely-mapped non-redundant reads were kept as useful reads and samples with >7M useful reads were considered as with sufficient read counts. To maximally leverage the sequencing data, biological replicates with <7M useful reads were merged to meet with the minimal reads count cutoff as long as they were highly correlated (Spearman correlation coefficient >0.5). Finally, peak calling was done with MACS2 for all useful reads (DRIP-seq, DRIVE-seq, MapR and R-loop CUT&Tag) or useful reads from Watson or Crick strand separately (DRIPc-seq, RDIP-seq, ssDRIP-seq, qDRIP-seq, R-ChIP and RR-ChIP), using q-value cutoff 0.01 for narrow peak (R-ChIP and R-loop CUT&Tag) and 0.05 for broad peak (DRIP-seq, DRIPc-seq, RDIP-seq, ssDRIP-seq, qDRIP-seq, DRIVE-seq, MapR and RR-ChIP). If multiple biological replicates existed, peaks with ≥50bp overlap among ≥2 replicates were merged and taken as reproducible peaks. Samples with <100 peaks called were discarded. Following ENCODE guidelines for ChIP-seq data analyses, we further calculated signal portion of tags (SPOT) and reads in blacklisted regions (RiBL) as part of quality control matrix for users' reference. Only peaks outside of ChIP-seq blacklisted regions were used for downstream analysis. Specially, bisDRIP-seq data are not readily for peak calling, we instead uploaded their processed signal tracks onto our genome browser for visualization and comparison with other R-loop mapping data. In total, 132 datasets for 26 human cells generated by 11 different technologies have been included in current release of R-loopBase (Table 1).
Table 1. Meta information for R-loop mapping dataTechnology | Treatment | Biological Samples | Datasets | PMID |
---|---|---|---|---|
DRIP-seq | Control | B-cell (1/1*), CHLA10 (1/1), EWS502(1/1), HeLa (4/4), HEK293 (2/2), SHSY5Y (2/2), TC32 (1/1), Stromal (4/4), Basal-epithelial (4/4), Luminal-progenitor (4/4), Mature-luminal-epithelial (4/4), MCF-7 (1/1), NT2 (6/6), K562 (2/1), Primary-fibroblast (2/2), U2OS (8/7), U87 (2/2), Jurkat (2/0), T-cells (2/0), IMR-90 (1/0), HEK293T (1/0) | 55/47 | 30108179, 32669707, 32769985, 28802045, 30060749, 28270613, 28649985, 27552054, 27373332, 23868195, 22387027, 26182405, 32747416, 30591567, 29416069, 29416038, 32439635, 32398827, 32686621, 28341774, 32615088 |
Knock down | U2OS (8/6) , U87 (2/2), HeLa (4/4), HEK293 (2/2), SHSY5Y (2/2) | 18/16 | 32747416, 32669707, 32686621, 32769985, 30060749, 28270613 | |
RDIP-seq | Control | HeLa (2/2), IMR-90 (1/1), HEK293T (1/0) | 4/3 | 30449723, 26579211 |
Knock down | HeLa (2/2) | 2/2 | 30449723 | |
DRIPc-seq | Control | K562 (2/2), HEK293 (2/2), NT2 (2/2) | 6/6 | 32439635, 30060749, 27373332 |
Knock down | K562 (2/2), HEK293 (2/2) | 4/4 | 32439635, 30060749 | |
ssDIP-seq | Control | HeLa (3/3), hVECs (2/2), hESCs (2/2), hiPSCs (2/2), hMSCs (2/2), hNSCs (2/2), hVSMCs (2/2) | 15/15 | 31606733, 32640435 |
Knock down | HeLa (3/3) | 3/3 | 31606733 | |
bisDRIP-seq | Control | MCF-7 (13/13) | 13/13 | 29072160 |
qDRIP-seq | Control | HeLa (3/2) | 3/2 | 32544226 |
DRIVE-seq | Control | NT2 (1/1) | 1/1 | 22387027 |
R-ChIP | Control | HEK293T (5/5), K562 (2/2), HeLa (1/0) | 8/7 | 29104020 32966794 |
RR-ChIP | Control | HeLa (2/2) | 2/2 | 31679819 |
MapR | Control | HEK293 (3/3), U87T (2/2) | 5/5 | 31665646 |
R-loop CUT&Tag | Control | HEK293T (6/6) | 6/6 | 33597247 |
SUM | - | - | 145/132 | - |
In current version, we did not include R-loop mapping data for other species besides human, mainly because of the small amount of datasets done by very limited number of different technologies, which prevents us to define R-loop zones of different confidence levels as we did for human cells (please refer to Q5).