RACE-Seq
README.md

RACE-Seq

Requirements:

  • Pre-processing of data
    • STAR
    • cutadapt
    • ConDeTri
    • BBMap
    • umi_tools
    • umicollapse
    • samtools
  • Tails identification
  • Visualization
    • R libraries
      • sqldf
      • ggplot2
      • dplyr
      • stringr
      • ggpubr
      • RColorBrewer
      • Ggridges

Procedure

I. Pre-processing

Raw reads are processed and map to the specific gene sequences. Pre-processing section include following steps: adapters trimming, removing of nucleotides from 3โ€™ end of reads with low quality, amplicons filtering, reads containing full length of UMI+delimiter filtering, UMI extraction, mapping and deduplication.

Script: 01_data_preprocessing

Files: fastq files with raw reads, ref_sequence.fa, annotation.gtf

II. Tails identification

The identification of tails is done thanks to the presence of soft-clips in BAM/SAM files. Soft-clips are 5โ€™ or 3โ€™ parts of reads that do not map to the reference sequence, and information about them is retained in BAM/SAM files by STAR. This script utilizes information from the soft-clip presents in the 3' end of the read.

Scripts: Tail_analysis.sh, seq.R, poli_clip_all.R

Files: SAM files, annotation.gtf

III. Visualization

In this section, you can plot graphs showing the length distribution of tails categorized by their classes, the content of U nucleotides in AU tails or only U tails, and the U content in tails based on their length. Scripts with X1KO in their names are prepared for data set included WT and XRN1-KO cell lines. Scripts with D2X1KO in their names are prepared for data set included WT, XRN1-KO, DCP2-KO and XRN1-DCP2-doubleKO cell lines.

Scripts: 03_visualization_L1_X1KO.Rmd, 03_visualization_GAPDH_X1KO.Rmd, 03_visualization_PABPC4_X1KO.Rmd, 03_visualization_functions_X1KO.R, 03_visualization_L1_D2X1KO.Rmd, 03_visualization_GAPDH_D2X1KO.Rmd, 03_visualization_functions_D2X1KO.R

Files: *.softclips.results.csv files, RACESeq_infos.csv