Rci-Variants / Pipeline_README.md
Pipeline_README.md
Raw

Rci_Variants

Pipeline for analysing libraries of expression + reporter pairs of Rci homologs

Basecalling

  1. Connect to ssh server to connect to Imperial's HPC.

The following describes how to access a Linux machine.

On Z640 PC (Niki’s big PC), go to files, then on left sidebar, click ‘Other Locations’. There is a search bar at the bottom where you type in: ssh://jk2615@login.hpc.ic.ac.uk

A pop-up window prompts you to type in Imperial password.

You should now be able to see jk2615 rds folder from the HPC.

  1. Download fast5 and transfer to rds folder.

Download fast5 files from OneDrive in batches of <4 GB, else the .zip file will be corrupted and reported as ‘empty’. Once all downloaded and extracted locally, transfer to Input folder on rds server. The job should result in ‘pass’ folder with fastq reads split up according to their barcodes.

  1. Basecall

Use guppy_RciVar.sh script for this. Change Input and Output paths. Keep .cfg file as r9.4.1 super accurate model if you sequenced with a Flongle flow cell. If sequenced with MinION flow cell R10.4, use dna_r10.4_e8.1_sup.cfg. Execute job with qsub 'guppy_RciVar.sh'. Check job status with 'qstat'.

MacroBam analysis

  1. Create a folder with MacroBam.sh script, and separate folders for every Rci. Each folder should contain dedicated 384 Rci references.
  2. Create another folder for analysis output. Inside, create a subdirectory for every Rci to be analysed.
  3. To each of those subdirectories, copy the fastq files from the corresponding barcode folders on rds server, along with the appropriate .fasta Rci references. So e.g., output subdirectory Rci1 should contain fastq files from barcode01 folder as well as 384 Rci1 references.
  4. Open MacroBam.sh and change directories.
  5. When you execute the script, directory name prompt will appear. Copy in name of subdirectory you are working in e.g. output subdirectory Rci1.
  • On Z640, must copy only from Documents onwards. home/jankatalinic seems to already be included.
  1. As you finish a variant's analysis, delete the Bam files to free up space for further analyses! These files take up loads of space!

Txt file wrangling

  1. Compress txt file folder, upload on OneDrive then download to local machine which has R & R Studio installed.
  2. Extract files, delete all .bam files.
  3. Eun easywrangler.R. Remember, inspect ‘dt’ datatable. SKIP ‘ ############ only to execute if missing a specific mod! ################## ’ section if all modules are present!! This should not be skipped for variants which have little to no shuffling!