For ipdTrimming, a Local trimming of IPD outliers at the subread level was introduced into IPD conversion: for each site in a DNA molecule, the top 10% of subread IPD values were removed; the remaining subread IPD values were then averaged to generate the CCS IPD value. This was implemented by a modified CCS module.
usc@hpc:~$ ccs -j 20 --hifi-kinetics subread.bam hifi.bam
For this modified CCS module, we get the CCS bam file with trimmed IPD value. Following a movie-time normalization was applied to compensate for variations in the polymerase elongation rate on individual DNA molecules, each CCS IPD value was normalized against the CCS IPD value averaged across all sites in a DNA molecule. This CCS IPD value was then compared to that of an unmodified base with the same sequence context, provided by a kmer model pretrained on the Sequel data. The last two steps were implemented by the script ipdRatiocalculator_FromCCS.py
usc@hpc:~$ python ipdRatiocalculator_FromCCS.py hifi.bam hifi.withIPDr.bam
#the cpu number can be set in this script.
Extract the IPDr for all A sites in individual molecule
usc@hpc:~$ python bamextractallAx_IPDvalue.py hifi.withIPDr.bam hifi.withIPDr_allA.xls
#the effective coverage (ec) can be set in this script, the default value is 20.
ipdRatiocalculator_fromCCS.py is a Python script for processing HiFi kinetic BAM files to compute IPD (inter-pulse duration) ratios using a pre-trained model. ###Prerequisites Before running the script, ensure that the following dependencies are installed in a Python 3 environment:
It is recommended to use Conda to manage dependencies:
conda create -n ipdcalc_env python=3.7 -y
conda activate ipdcalc_env
conda install -c bioconda numpy pandas tqdm pysam pbcore
git clone https://github.com/PacificBiosciences/kineticsTools.git
cd kineticsTools
pip install .
PacBio SMRTLink Suite (Provides the kinetics tools)
The script requires:
A BAM file: Unaligned HiFi kinetic BAM file processed with ccs-kinetics-bystrandify
A lookup table (SP3-C3.npz.gz): This file is hardcoded into the script. Ensure the correct path is used. If you don't find this file, you can download from this site: https://github.com/PacificBiosciences/kineticsTools/blob/master/kineticsTools/resources/SP3-C3.npz.gz.
Issue: Missing Dependencies
If you encounter an error like:
ModuleNotFoundError: No module named 'pbcore'
Ensure dependencies are installed using Conda or Pip:
conda install -c bioconda pbcore
pip install pbcore