# RACE-Seq ## Requirements: - Pre-processing of data - STAR - cutadapt - ConDeTri - BBMap - umi_tools - umicollapse - samtools - Tails identification - get_softclipped_reads_from_sam.pl (from https://github.com/smaegol/LINE_1_RACE_seq_analysis) - seqtk - *R* libraries: - biomaRt - tidyr - stringr - dplyr - sqldf - stringi - optparse - Visualization - *R* libraries - sqldf - ggplot2 - dplyr - stringr - ggpubr - RColorBrewer - Ggridges ## Procedure ### I. Pre-processing Raw reads are processed and map to the specific gene sequences. Pre-processing section include following steps: adapters trimming, removing of nucleotides from 3’ end of reads with low quality, amplicons filtering, reads containing full length of UMI+delimiter filtering, UMI extraction, mapping and deduplication. **Script:** 01_data_preprocessing **Files:** fastq files with raw reads, ref_sequence.fa, annotation.gtf ### II. Tails identification The identification of tails is done thanks to the presence of soft-clips in BAM/SAM files. Soft-clips are 5’ or 3’ parts of reads that do not map to the reference sequence, and information about them is retained in BAM/SAM files by STAR. This script utilizes information from the soft-clip presents in the 3' end of the read. **Scripts:** Tail_analysis.sh, seq.R, poli_clip_all.R **Files:** SAM files, annotation.gtf ### III. Visualization In this section, you can plot graphs showing the length distribution of tails categorized by their classes, the content of U nucleotides in AU tails or only U tails, and the U content in tails based on their length. Scripts with X1KO in their names are prepared for data set included WT and XRN1-KO cell lines. Scripts with D2X1KO in their names are prepared for data set included WT, XRN1-KO, DCP2-KO and XRN1-DCP2-doubleKO cell lines. **Scripts:** 03_visualization_L1_X1KO.Rmd, 03_visualization_GAPDH_X1KO.Rmd, 03_visualization_PABPC4_X1KO.Rmd, 03_visualization_functions_X1KO.R, 03_visualization_L1_D2X1KO.Rmd, 03_visualization_GAPDH_D2X1KO.Rmd, 03_visualization_functions_D2X1KO.R **Files:** *.softclips.results.csv files, RACESeq_infos.csv