Methodology

The full computational pipeline behind every ToxScreen report. We publish every threshold, every reference compound, every benchmark, and every known limitation. If anything below is unclear or appears wrong, tell us.

On this page

Computational pipeline

Every screening runs the same five-step pipeline:

  1. SMILES validation. Input SMILES is parsed and canonicalized with RDKit. Invalid structures are rejected with a row-level error message before any GPU compute is consumed.
  2. ADMET pre-flight. Molecular-property descriptors are computed locally: MW, LogP, TPSA, HBD/HBA, QED, rotatable bonds, PAINS substructure match, Lipinski violations. These appear in the report and feed the Composite Tox Index ADMET penalty.
  3. Boltz-2 structure prediction. For each (compound, target) pair, Boltz-2 (Passaro et al., 2025) generates a protein–ligand complex structure with confidence scores (pLDDT, PAE, ipTM). Scoring runs N replicates (default 3) with independent random seeds; replicate variance is reported.
  4. Affinity scoring. Boltz-2 returns a binary probability (bp) of binding above a defined threshold. Per-target thresholds are calibrated against known inhibitors and negative controls (see calibration data).
  5. Composite Tox Index. Per-target normalized scores are weighted (hERG 35%, CYP3A4 25%, CYP2D6 20%, CYP2C9 20%) and aggregated with an ADMET penalty term. See CTI.

Target panel

The current ToxScreen panel covers four critical safety endpoints:

Target PDB Why it matters
hERG (KCNH2)7CN1QT prolongation, sudden cardiac death risk. Drug withdrawals: terfenadine, cisapride, astemizole.
CYP3A46MA7Metabolizes ~50% of marketed drugs. Inhibition causes major DDIs.
CYP2D64WNTPolymorphic metabolizer of ~25% of drugs. Inhibition affects PMs disproportionately.
CYP2C91OG5NSAIDs, sulfonylureas, anticoagulants (warfarin is a substrate — inhibition raises warfarin exposure). Key inhibitors: sulfaphenazole, fluconazole.

Coming next quarter: CYP1A2, CYP2B6, MRP2/4, BCRP, BSEP, PXR, AhR, TopBP1. We add only targets we have calibrated benchmarks for.

Calibration data

We validate the Boltz-2 pipeline on each target against published known inhibitors and negative controls (PubChem-verified SMILES). All raw data, including AUC-ROC, sensitivity, specificity, MCC, and the exact compound list, are public.

Calibration is in progress. Per-target AUC, threshold, sensitivity, and specificity will appear here when the run completes.
Target PDB Threshold (bp) AUC-ROC MCC Sens. Spec. n (pos+neg)

Calibration compound set (per target):

hERG (7CN1) — positives: Dofetilide (IC50=12nM), Terfenadine (~50nM), Cisapride (~45nM), Astemizole (~1nM)

hERG (7CN1) — negatives: Aspirin, Metformin, Caffeine

CYP3A4 (6MA7) — positives: Ketoconazole (Ki=15nM), Ritonavir, Itraconazole

CYP3A4 (6MA7) — negatives: Aspirin, Caffeine, Metformin

CYP2D6 (4WNT) — positives: Quinidine (Ki=60nM), Paroxetine (~150nM), Fluoxetine

CYP2D6 (4WNT) — negatives: Aspirin, Metformin, Ibuprofen

CYP2C9 (1OG5) — positives: Sulfaphenazole (Ki=300nM), Fluconazole (Ki=7µM) (Note: Warfarin is a CYP2C9 substrate, not an inhibitor — it has been removed from the positive control set)

CYP2C9 (1OG5) — negatives: Aspirin, Caffeine, Metformin

Validation gate: AUC ≥ 0.70 to enter the public panel. Targets that fail this gate are flagged as provisional in every report and excluded from the headline CTI; users see the per-target prediction but are warned.

Expanding the benchmark: v0 calibration uses 6-7 compounds per target. The 95% confidence interval on AUC at this n is ~±0.25 — wide. We are running a v1 calibration with 100 compounds per target sourced from Tox21 and curated PubChem bioassay sets (planned completion: 4-6 weeks). v1 will replace v0 thresholds when AUC CI tightens to ±0.05.

Composite Tox Index (CTI)

The CTI is a single weighted score in [0, 1] aggregating per-target normalized binding scores plus an ADMET penalty.

CTI = Σᵢ (weight_i × normalized_score_i) + ADMET_penalty
    weight_hERG = 0.35
    weight_CYP3A4 = 0.25
    weight_CYP2D6 = 0.20
    weight_CYP2C9 = 0.20
    ADMET_penalty = Σ (penalty_j × flag_j)  for {logp, mw, hbd, hba, pains, rot_bonds, sa_score}
normalized_score_i = clamp((reference_i − raw_i) / (reference_i − random_baseline), 0, 1)
   reference_i = published-inhibitor reference score (kcal/mol) for target i
   random_baseline = -4.5 kcal/mol (typical weak-binder noise)

Risk classification bands and action thresholds:

LevelCTIInterpretation
LOW0.00 – 0.25No significant predicted binding to safety targets.
MODERATE0.25 – 0.45Weak binding detected; monitor in follow-up assays.
HIGH0.45 – 0.65Significant binding to one or more targets; in-vitro validation recommended.
CRITICAL0.65 – 1.00Strong binding to multiple targets; high-priority safety concern.

Special rule: If hERG normalized score ≥ 0.7, the report automatically elevates risk to at least HIGH regardless of the composite — cardiac liability dominates safety triage.

Confidence & replicates

Every prediction reports both where the model thinks the compound binds and how confident that prediction is. Two independent signals:

Known limitations

This list is exhaustive to our knowledge. If you find a limitation we don't list, we want to know.

Reproducibility

Every report ships a methodology stamp footer with the exact configuration in use at scoring time:

Boltz-2 NIM · 3 replicates · GX10 GB10 GPU
Calibration md5: (8-char hash of the active threshold config)
Frozen: (UTC timestamp at scoring time)

Re-running the same SMILES against the same calibration config will produce identical replicate-mean scores (deterministic up to floating-point ε in the affinity head). Replicate variance comes from the diffusion model's stochastic sampling.

References

  1. Passaro S, et al. Boltz-2: Towards accurate and efficient binding affinity prediction. bioRxiv 2025.
  2. Pires DEV, Blundell TL, Ascher DB. pkCSM: predicting small-molecule pharmacokinetic and toxicity properties using graph-based signatures. J. Med. Chem. 2015.
  3. Daina A, Michielin O, Zoete V. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci. Rep. 2017.
  4. Vargas HM, et al. Time for a fully integrated nonclinical-clinical risk assessment to streamline QT prolongation liability determinations. Clin. Pharmacol. Ther. 2018.
  5. Shekhar S, et al. Limitations of structure-based affinity prediction for novel chemical space. 2025.
  6. Ji et al. The OECD QSAR Toolbox. Chemosphere 2023.

Privacy & data handling

Legal disclaimers

RESEARCH USE ONLY. ToxScreen output is for research and informational purposes only. It is not approved or cleared by any regulatory authority for any clinical, diagnostic, prophylactic, or therapeutic use.

NOT MEDICAL ADVICE. ToxScreen does not provide medical advice, diagnosis, or treatment recommendations. Compounds flagged "low risk" may be unsafe. Compounds flagged "high risk" may be safe in a given indication or dose.

NOT A REGULATORY SUBMISSION. ToxScreen output cannot replace IND-enabling toxicology studies, GLP packages, ICH M7 mutagenicity assessments, or any FDA/EMA/PMDA-mandated assay. Do not submit ToxScreen reports as primary safety evidence to any regulator.

NO WARRANTY. ToxScreen is provided "as is" without warranty of any kind, express or implied, including without limitation warranties of merchantability, fitness for a particular purpose, accuracy, completeness, or non-infringement. Computational predictions carry inherent uncertainty and may not generalize to compounds outside the calibration benchmark.

LIMITATION OF LIABILITY. To the maximum extent permitted by law, the operators of ToxScreen disclaim any liability for direct, indirect, incidental, consequential, punitive, or special damages arising out of or in connection with the use of ToxScreen output, including but not limited to safety, clinical, regulatory, or commercial decisions made on the basis of any prediction.

USER OBLIGATIONS. Users represent and warrant that they will not use ToxScreen output as the sole or primary basis for any decision affecting human or animal exposure, that they will independently validate any prediction with appropriate experimental assays before acting on it, and that they will not rely on ToxScreen output to satisfy any regulatory, ethical, or contractual obligation.

INDEMNIFICATION. Users agree to indemnify and hold harmless the operators of ToxScreen from any claims, damages, losses, or expenses (including reasonable attorneys' fees) arising out of or related to the user's use of ToxScreen output, including any reliance on predictions in safety, clinical, or commercial decisions.

JURISDICTION. This service is operated from the United States. Use is governed by US law. Users outside the United States agree that any disputes will be resolved in the courts of the United States.

For full Terms of Service and Privacy Policy, see Terms and Privacy.

Last updated: . Methodology evolves as we add targets, expand calibration, and incorporate user feedback. Material changes are versioned and logged at git history.