This repository contains scripts for analysis and visualization of specific edits observed using Nanopore and Illumina (NGS) sequencing technologies. They were used in the following paper:
Targeted DNA ADP-ribosylation drives distinct editing outcomes in bacteria and eukaryotes (2024)
NGS and Nanopore raw sequencing data used in the paper are available at SRA.
Example of processed data to run each of the scripts present in this repository are available in the data folder.
data: Directory containing example data.
analysis: Directory containing the R scripts.
outputs: This directory is created when running the scripts. It will contain the processed data and different tables/plots.
Option 1: Download manually the repository as a ZIP archive and extract it locally on your computer
Option 2: Clone the repository
git clone https://github.com/saliba-lab/MBE_analysis.git
cd MBE_analysis/analysis
See Dependencies section.
Make sure to set the analysis directory as the working directory when running the scripts.
Scripts 1 to 3 are related to data obtained by Illumina sequencing. Allele_frequency_table_around_sgRNA_ files (in .txt format) generated by CRISPResso2 are used as inputs for running these scripts. They also require sample specific metadata indicated in sample sheets (also located in the analysis directory). The metadata can refer to a minimal read count number for including a sample in the analysis (Threshold_read_counts), position of interest in the sequencing read to look for mutations (Mutation_Position) or replicate information (Replicate_group).
Script1: This script sums up %Reads containing nucleotides different from the reference, at a specified position.
Script2: This script sums up %Reads containing a specific nucleotide for all positions along the length of the read.
Script3: This script sums up %Reads containing nucleotides different from the reference, at two positions specified in the sample sheet and produce graphical representations for visualization.
Scripts 4 to 6 are related to data obtained by Nanopore sequencing.
Script4: Related to figure 1h and S3. This script reads BAM files, then takes two actions. First, it calculates and plots the fraction of unedited, edited, and ambiguous reads in a region. Second, it calculates and plots SNVs at a specific position in an otherwise unedited region, as a percent of all reads.
Script5: Related to figure 1i. This script reads a CSV file, then calculates and plots frequency of SNVs that exceed filtering criteria.
Script6: Related to figure S5. This script reads a CSV file, then takes two actions. First, it plots individual growth curves. Second, it plots final values of absorbance at 600 nm.
List of R packages necessary to run the scripts.