Skip to content

Instantly share code, notes, and snippets.

@colbyford
Last active May 31, 2025 18:50
Show Gist options
  • Save colbyford/a6be8ee20ff1f43efd2666cce98e58ef to your computer and use it in GitHub Desktop.
Save colbyford/a6be8ee20ff1f43efd2666cce98e58ef to your computer and use it in GitHub Desktop.
Score Folded Protein Complexes with HADDOCK3

Score Folded Protein Complexes with HADDOCK3

Colby T. Ford, Ph.D.

Motivation

HADDOCK3 provides a powerful workflow to docke two or more protein structures and it provides multiple metrics as an output for quantifying molecular binding affinity. However, a full docking pipeline may take some time to complete.

To get binding affinity metrics for PDB files where the protein complex already exists, we don't necessarily need to perform a full docking workflow. Also, there is a need to understand binding affinity in protein complex files generated from multimer folding prediction tools like AlphaFold3, Boltz-1, or ESM3.

Thus, we present the following functions for getting the binding affinity of a given input PDB file of a protein complex.

Install HADDOCK3

pip install haddock3 pandas

score_complex()

This simple function performs the [mdscoring] step from HADDOCK3 to return binding affinity metrics in a couple minutes. It makes a .cfg file to define the topology generation and MD analysis, which can be customized for your specific needs.

For example, if we have a crystal structure of a protein complex (e.g., PDB 5JXE), we can quickly get the predicted affinity metrics.

  • score_complex(pdb_path="5jxe_chainsAEF.pdb") returns:
{
  'score': -390.214,
  'total': -655.391,
  'vdw': -209.387,
  'elec': -446.005,
  'desolv': -91.626,
  'bsa': 5384.33
}

Also, if we have a multimer folded structure (e.g., from AlphaFold3) of the same protein chains, we can quickly get similar binding metrics of the predicted protein complex.

  • score_complex(pdb_path="5jxe_af3_pred.pdb") returns:
{
  'score': -411.69,
  'total': -655.298,
  'vdw': -231.729,
  'elec': -425.053,
  'desolv': -94.95,
  'bsa': 5857.95
}

As you can see, the metrics differ slighty between the crystal structure from Protein Data Bank and the predicted multimer structure from AlphaFold3.

This provides a much faster way of getting a docked structure (from a folding algorithm) while still getting binding metrics (from HADDOCK3) in a few minutes.

haddock3_score()

Alternatively, this is a basic wrapper for the CLI command, haddock3-score, which runs HADDOCK3's [emscoring] module. This provides less customization than the previous function, but provides results in ~20 seconds.

  • haddock3_score(pdb_path = "5jxe_chainsAEF.pdb") returns:
{
  'score': -376.2368,
  'vdw': -220.158,
  'elec': -322.188,
  'desolv': -91.6412,
  'bsa': 5168.68,
  'total': -542.346
}
  • haddock3_score(pdb_path = "5jxe_af3_pred.pdb") returns:
{
  'score': -398.389,
  'vdw': -226.213,
  'elec': -403.859,
  'desolv': -91.4042,
  'bsa': 5702.45,
  'total': -630.072
 }
import re
import subprocess
def haddock3_score(pdb_path:str) -> dict:
try:
## Run haddock3-score CLI
command = ["haddock3-score", "--full", pdb_path]
sp_result = subprocess.run(command, capture_output=True, text=True, check=True)
## Parse result
metrics = {}
## Extract HADDOCK score
match = re.search(r"HADDOCK-score \(emscoring\) = ([\-\d\.]+)", sp_result.stdout)
if match:
metrics["score"] = float(match.group(1))
## Extract individual energy terms
matches = re.findall(r"(\w+)=([\-\d\.]+)", sp_result.stdout)
for key, value in matches:
metrics[key] = float(value)
## Calculate total score
metrics["total"] = metrics["vdw"] + metrics["elec"]
## Remove air
del metrics["air"]
return metrics
except subprocess.CalledProcessError as e:
print("HADDOCK3 Error occurred:", e.stderr)
return {}
import shutil
import subprocess
import pandas as pd
from pathlib import Path
import tempfile
import os
def score_complex(pdb_path: str, n_cores: int = os.cpu_count()) -> dict:
pdb_file = Path(pdb_path)
with tempfile.TemporaryDirectory() as temp_dir:
temp_path = Path(temp_dir)
shutil.copy(pdb_file, temp_path)
complex_pdb = pdb_file.name
cfg_body = f'''
## Docked protein complex scoring
run_dir = "run0"
mode = "local"
ncores = {n_cores}
molecules = [
"{complex_pdb}"
]
[topoaa]
autohis = true
[mdscoring]
nemsteps = 1
per_interface_scoring = false
'''
cfg_file = temp_path / "score.cfg"
cfg_file.write_text(cfg_body.strip())
try:
command = ["haddock3", str(cfg_file)]
subprocess.run(command, capture_output=True, text=True, check=True, cwd=temp_path)
capri_file = temp_path / "run0/analysis/1_mdscoring_analysis/capri_ss.tsv"
capri_ss = pd.read_csv(capri_file, sep="\t")
metrics = capri_ss[['score', 'total', 'vdw', 'elec', 'desolv', 'bsa']].iloc[0].to_dict()
return metrics
except subprocess.CalledProcessError as e:
print("HADDOCK3 Error occurred:", e.stderr)
return {}
except Exception as e:
print("Unexpected error:", e)
return {}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment