Test_Bundle/Instructions.txt · SherlockLung-EvolutionaryTrajectory-Analysis

### RUNNING THE TEST BUNDLE ###

This test bundle is a reasonably lightweight subset of the samples used in "Uncovering the lung adenocarcinomas evolving on a smoking-like trajectory in people who have never smoked"

The subset is made up of 55 samples, all of which were assigned to the SD trajectory. These instructions therefore reproduce an approximation of the SD trajectory.

Note that the precise outputs may show slight differences to those included for two reasons:
1) One stage of the model includes running 1000 iterations in order to handle stochasticity
2) The generation of phylogenetic trees has a primary mode and a memory-saving mode. The memory saving mode is automatically used for samples that cause out of memory errors. This will vary from machine to machine.

In order to run this code you will first need to follow the instructions for setting up the ordering model. These can be found in this repository at Ordering_Model/README.txt

You will then need to untar/gz the input files at Test_Bundle/inputs.tar.gz

Then, you can run the code in the following three steps:

# Note: paths will need to be updated according to where you have stored files

### STEP 1: DE NOVO DISCOVERY ###

OM_dir=path/to/Ordering_Model/

name=SherlockLung_test_bundle55SD

${OM_dir}/BASH/pipeline.sh -n ${name} -i path/to/inputs/subclones_files/ -o path/to/new/output_folder/${name}/ -p PLMIX -g 1,2,3,4,5 -u notIncluded -r path/to/inputs/sherlock_driver_symbols.txt -m path/to/inputs/sherlock_nonsyn_muts.txt -d path/to/inputs/dp_data/ -w path/to/inputs/SherlockLung_WGDStatus.txt -c path/to/inputs/WCC_bestClusterInfo/


### STEP 2: IDENTIFY THE OPTIMAL NUMBER OF SUBSETS (do not run until step 1 is complete) ###

# this line will need to be adjusted if you are running on a non-SGE HPC
qsub ${OM_dir}/BASH/run_postFDR_5b.sh ${OM_dir}/R/TM_postFDR_5b_select_best_G.R ${name} path/to/new/output_folder/${name}/

# best G identified as 1


### STEP 3: RUN THE MODEL FOR THE DISCOVERED SET(S) (do not run until step 2 is complete) ###

dn=1
name=SherlockLung_test_bundle55SD_DN${dn}

# as best G = 1, no sample groups files are generated (there is no need as we have identified only one trajectory that all samples follow), so we just use all the barcodes again

${OM_dir}/BASH/pipeline.sh -n ${name} -i path/to/inputs/subclones_files/ -o path/to/new/output_folder/${name}/ -p PlackettLuce -g 1 -u notIncluded -r path/to/inputs/sherlock_driver_symbols.txt -m path/to/inputs/sherlock_nonsyn_muts.txt -d path/to/inputs/dp_data/ -w path/to/inputs/SherlockLung_WGDStatus.txt -c path/to/inputs/WCC_bestClusterInfo/



### Downstream analyses and figure generation can be carried out using code in the Analyses folder of this repository