6mA-footprint
README.md

6mA_footprint

Here is the comparison of pipelines between ipdSummary and ipdTrimming.

image

For ipdTrimming, a Local trimming of IPD outliers at the subread level was introduced into IPD conversion: for each site in a DNA molecule, the top 10% of subread IPD values were removed; the remaining subread IPD values were then averaged to generate the CCS IPD value. This was implemented by a modified CCS module.

usc@hpc:~$ ccs -j 20 --hifi-kinetics  subread.bam hifi.bam
    

For this modified CCS module, we get the CCS bam file with trimmed IPD value. Following a movie-time normalization was applied to compensate for variations in the polymerase elongation rate on individual DNA molecules, each CCS IPD value was normalized against the CCS IPD value averaged across all sites in a DNA molecule. This CCS IPD value was then compared to that of an unmodified base with the same sequence context, provided by a kmer model pretrained on the Sequel data. The last two steps were implemented by the script ipdRatiocalculator_FromCCS.py

usc@hpc:~$ python ipdRatiocalculator_FromCCS.py hifi.bam hifi.withIPDr.bam
#the cpu number can be set in this script.

Extract the IPDr for all A sites in individual molecule

usc@hpc:~$ python bamextractallAx_IPDvalue.py hifi.withIPDr.bam hifi.withIPDr_allA.xls
#the effective coverage (ec) can be set in this script, the default value is 20.

Tutorial for ipdRatiocalculator_fromCCS.py

ipdRatiocalculator_fromCCS.py is a Python script for processing HiFi kinetic BAM files to compute IPD (inter-pulse duration) ratios using a pre-trained model. ###Prerequisites Before running the script, ensure that the following dependencies are installed in a Python 3 environment:

1.1 Install Dependencies via Conda

It is recommended to use Conda to manage dependencies:

conda create -n ipdcalc_env python=3.7 -y
conda activate ipdcalc_env
conda install -c bioconda numpy pandas tqdm pysam pbcore

1.2 Install kineticsTools

git clone https://github.com/PacificBiosciences/kineticsTools.git
cd kineticsTools
pip install .

1.3 Required External Software

PacBio SMRTLink Suite (Provides the kinetics tools)

Preparing Input Files

The script requires:

A BAM file: Unaligned HiFi kinetic BAM file processed with ccs-kinetics-bystrandify
A lookup table (SP3-C3.npz.gz): This file is hardcoded into the script. Ensure the correct path is used. If you don't find this file, you can download from this site: https://github.com/PacificBiosciences/kineticsTools/blob/master/kineticsTools/resources/SP3-C3.npz.gz.

Troubleshooting

Issue: Missing Dependencies

If you encounter an error like:

ModuleNotFoundError: No module named 'pbcore'

Ensure dependencies are installed using Conda or Pip:

conda install -c bioconda pbcore
pip install pbcore