Raw reads are processed and map to the specific gene sequences. Pre-processing section include following steps: adapters trimming, removing of nucleotides from 3’ end of reads with low quality, amplicons filtering, reads containing full length of UMI+delimiter filtering, UMI extraction, mapping and deduplication.
Script: 01_data_preprocessing
Files: fastq files with raw reads, ref_sequence.fa, annotation.gtf
The identification of tails is done thanks to the presence of soft-clips in BAM/SAM files. Soft-clips are 5’ or 3’ parts of reads that do not map to the reference sequence, and information about them is retained in BAM/SAM files by STAR. This script utilizes information from the soft-clip presents in the 3' end of the read.
Scripts: Tail_analysis.sh, seq.R, poli_clip_all.R
Files: SAM files, annotation.gtf
In this section, you can plot graphs showing the length distribution of tails categorized by their classes, the content of U nucleotides in AU tails or only U tails, and the U content in tails based on their length. Scripts with X1KO in their names are prepared for data set included WT and XRN1-KO cell lines. Scripts with D2X1KO in their names are prepared for data set included WT, XRN1-KO, DCP2-KO and XRN1-DCP2-doubleKO cell lines.
Scripts: 03_visualization_L1_X1KO.Rmd, 03_visualization_GAPDH_X1KO.Rmd, 03_visualization_PABPC4_X1KO.Rmd, 03_visualization_functions_X1KO.R, 03_visualization_L1_D2X1KO.Rmd, 03_visualization_GAPDH_D2X1KO.Rmd, 03_visualization_functions_D2X1KO.R
Files: *.softclips.results.csv files, RACESeq_infos.csv