Baker-Lab-Umol

From HPC Docs
Revision as of 20:48, 29 November 2023 by Admin (talk | contribs)
Jump to navigation Jump to search

Run the Umol Test Case

Make a directory for the SLURM submission script and the job output

  mkdir umol-test-case
  cd umol-test-case

Using your favorite editor like nano, copy-and-paste the following code and save the file as submit.sh.

#!/bin/bash

#SBATCH --chdir=./                  # Working directory
#SBATCH --mail-user=nobody@tcnj.edu # Who to send emails to
#SBATCH --mail-type=ALL             # Send emails on start, end and failure
#SBATCH --job-name=umol 	        # Job name
#SBATCH --output=job.%j.out         # Name of stdout output file (%j expands to jobId)
#SBATCH --nodes=1                   # Total number of nodes (a.k.a. servers) requested
#SBATCH --ntasks=1                  # Total number of mpi tasks requested
#SBATCH --mem=6G
#SBATCH --partition=gpu             # Partition (a.k.a.queue) to use
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00             # Run time (days-hh:mm:ss)

# Setup environment 
module add umol
hostname
module list

# Run the test case
echo "#### Copying 'data' folder from distribution (could take 15-20 minutes if 'data' doesn't already exist) ####"
cp -ru $UMOL/data .

ID=7NB4
FASTA=./data/test_case/7NB4/7NB4.fasta
POCKET_INDICES=./data/test_case/7NB4/7NB4_pocket_indices.npy #Zero indexed numpy array of what residues are in the pocket (all CBs within 10Å from the ligand)
LIGAND_SMILES='CCc1sc2ncnc(N[C@H](Cc3ccccc3)C(=O)O)c2c1-c1cccc(Cl)c1C' #Make sure these are canonical as in RDKit
UNICLUST=./data/uniclust30_2018_08/uniclust30_2018_08
OUTDIR=./data/test_case/7NB4/

echo '#### Search Uniclust30 with HHblits to generate an MSA (a few minutes) ####'
HHBLITS=$UMOL/hh-suite/build/bin/hhblits
$HHBLITS -i $FASTA -d $UNICLUST -E 0.001 -all -oa3m $OUTDIR/$ID'.a3m'

# Activate the 'umol' conda environment

echo '#### Generate input feats (seconds) ####'
python3 $UMOL/src/make_msa_seq_feats.py --input_fasta_path $FASTA \
--input_msas $OUTDIR/$ID'.a3m' \
--outdir $OUTDIR

python3 $UMOL/src/make_ligand_feats.py --input_smiles $LIGAND_SMILES \
--outdir $OUTDIR

echo "#### Predict (a few minutes) ####"
MSA_FEATS=$OUTDIR/msa_features.pkl
LIGAND_FEATS=$OUTDIR/ligand_inp_features.pkl
PARAMS=data/params/params40000.npy
NUM_RECYCLES=3

python3 $UMOL/src/predict.py --msa_features  $MSA_FEATS \
--ligand_features $LIGAND_FEATS \
--id $ID \
--ckpt_params $PARAMS \
--target_pos $POCKET_INDICES \
--num_recycles $NUM_RECYCLES \
--outdir $OUTDIR

wait
RAW_PDB=$OUTDIR/$ID'_pred_raw.pdb'
python3 $UMOL/src/relax/align_ligand_conformer.py --pred_pdb $RAW_PDB \
--ligand_smiles $LIGAND_SMILES --outdir $OUTDIR

grep ATOM $OUTDIR/$ID'_pred_raw.pdb' > $OUTDIR/$ID'_pred_protein.pdb'
echo "The unrelaxed predicted protein can be found at $OUTDIR/$ID'_pred_protein.pdb' and the ligand at $OUTDIR/$ID'_pred_ligand.sdf'"

echo '#### Relax the protein (a few minutes) ####'
conda activate openmm #Assumes you have conda in your path
PRED_PROTEIN=$OUTDIR/$ID'_pred_protein.pdb'
PRED_LIGAND=$OUTDIR/$ID'_pred_ligand.sdf'
RESTRAINTS="CA+ligand" # or "protein"

python3 $UMOL/src/relax/openmm_relax.py --input_pdb $PRED_PROTEIN \
                        --ligand_sdf $PRED_LIGAND \
                        --file_name $ID \
                        --restraint_type $RESTRAINTS \
                        --outdir $OUTDIR

#Deactivate conda - only for the relaxation
conda deactivate

# Reactivate umol environment
RAW_COMPLEX=$OUTDIR/$ID'_pred_raw.pdb'
RELAXED_COMPLEX=$OUTDIR/$ID'_relaxed_complex.pdb'
python3 $UMOL/src/relax/add_plddt_to_relaxed.py  --raw_complex $RAW_COMPLEX \
--relaxed_complex $RELAXED_COMPLEX  \
--outdir $OUTDIR
echo "The final relaxed structure can be found at $OUTDIR/$ID'_relaxed_plddt.pdb'"

Submit the job to the SLURM scheduler.

  sbatch submit.sh

View the contents of the job.<jobid>.out file for the status and any errors. Replace <jobid> with the job number from the previous sbatch command output.

References

Umol Github site