RACE-Seq

Requirements:

Pre-processing of data
- STAR
- cutadapt
- ConDeTri
- BBMap
- umi_tools
- umicollapse
- samtools
Tails identification
- get_softclipped_reads_from_sam.pl (from https://github.com/smaegol/LINE_1_RACE_seq_analysis)
- seqtk
- R libraries:
  - biomaRt
  - tidyr
  - stringr
  - dplyr
  - sqldf
  - stringi
  - optparse
Visualization
- R libraries
  - sqldf
  - ggplot2
  - dplyr
  - stringr
  - ggpubr
  - RColorBrewer
  - Ggridges

Procedure

I. Pre-processing

Raw reads are processed and map to the specific gene sequences. Pre-processing section include following steps: adapters trimming, removing of nucleotides from 3’ end of reads with low quality, amplicons filtering, reads containing full length of UMI+delimiter filtering, UMI extraction, mapping and deduplication.

Script: 01_data_preprocessing

Files: fastq files with raw reads, ref_sequence.fa, annotation.gtf

II. Tails identification

The identification of tails is done thanks to the presence of soft-clips in BAM/SAM files. Soft-clips are 5’ or 3’ parts of reads that do not map to the reference sequence, and information about them is retained in BAM/SAM files by STAR. This script utilizes information from the soft-clip presents in the 3' end of the read.

Scripts: Tail_analysis.sh, seq.R, poli_clip_all.R

Files: SAM files, annotation.gtf

III. Visualization

In this section, you can plot graphs showing the length distribution of tails categorized by their classes, the content of U nucleotides in AU tails or only U tails, and the U content in tails based on their length. Scripts with X1KO in their names are prepared for data set included WT and XRN1-KO cell lines. Scripts with D2X1KO in their names are prepared for data set included WT, XRN1-KO, DCP2-KO and XRN1-DCP2-doubleKO cell lines.

Scripts: 03_visualization_L1_X1KO.Rmd, 03_visualization_GAPDH_X1KO.Rmd, 03_visualization_PABPC4_X1KO.Rmd, 03_visualization_functions_X1KO.R, 03_visualization_L1_D2X1KO.Rmd, 03_visualization_GAPDH_D2X1KO.Rmd, 03_visualization_functions_D2X1KO.R

Files: *.softclips.results.csv files, RACESeq_infos.csv