Methodology
The full computational pipeline behind every ToxScreen report. We publish every threshold, every reference compound, every benchmark, and every known limitation. If anything below is unclear or appears wrong, tell us.
On this page
Computational pipeline
Every screening runs the same five-step pipeline:
- SMILES validation. Input SMILES is parsed and canonicalized with RDKit. Invalid structures are rejected with a row-level error message before any GPU compute is consumed.
- ADMET pre-flight. Molecular-property descriptors are computed locally: MW, LogP, TPSA, HBD/HBA, QED, rotatable bonds, PAINS substructure match, Lipinski violations. These appear in the report and feed the Composite Tox Index ADMET penalty.
- Boltz-2 structure prediction. For each (compound, target) pair, Boltz-2 (Passaro et al., 2025) generates a protein–ligand complex structure with confidence scores (pLDDT, PAE, ipTM). Scoring runs N replicates (default 3) with independent random seeds; replicate variance is reported.
- Affinity scoring. Boltz-2 returns a binary probability (bp) of binding above a defined threshold. Per-target thresholds are calibrated against known inhibitors and negative controls (see calibration data).
- Composite Tox Index. Per-target normalized scores are weighted (hERG 35%, CYP3A4 25%, CYP2D6 20%, CYP2C9 20%) and aggregated with an ADMET penalty term. See CTI.
Target panel
The current ToxScreen panel covers four critical safety endpoints:
| Target | PDB | Why it matters |
|---|---|---|
| hERG (KCNH2) | 7CN1 | QT prolongation, sudden cardiac death risk. Drug withdrawals: terfenadine, cisapride, astemizole. |
| CYP3A4 | 6MA7 | Metabolizes ~50% of marketed drugs. Inhibition causes major DDIs. |
| CYP2D6 | 4WNT | Polymorphic metabolizer of ~25% of drugs. Inhibition affects PMs disproportionately. |
| CYP2C9 | 1OG5 | NSAIDs, sulfonylureas, anticoagulants (warfarin is a substrate — inhibition raises warfarin exposure). Key inhibitors: sulfaphenazole, fluconazole. |
Coming next quarter: CYP1A2, CYP2B6, MRP2/4, BCRP, BSEP, PXR, AhR, TopBP1. We add only targets we have calibrated benchmarks for.
Calibration data
We validate the Boltz-2 pipeline on each target against published known inhibitors and negative controls (PubChem-verified SMILES). All raw data, including AUC-ROC, sensitivity, specificity, MCC, and the exact compound list, are public.
| Target | PDB | Threshold (bp) | AUC-ROC | MCC | Sens. | Spec. | n (pos+neg) |
|---|---|---|---|---|---|---|---|
Calibration compound set (per target):
hERG (7CN1) — positives: Dofetilide (IC50=12nM), Terfenadine (~50nM), Cisapride (~45nM), Astemizole (~1nM)
hERG (7CN1) — negatives: Aspirin, Metformin, Caffeine
CYP3A4 (6MA7) — positives: Ketoconazole (Ki=15nM), Ritonavir, Itraconazole
CYP3A4 (6MA7) — negatives: Aspirin, Caffeine, Metformin
CYP2D6 (4WNT) — positives: Quinidine (Ki=60nM), Paroxetine (~150nM), Fluoxetine
CYP2D6 (4WNT) — negatives: Aspirin, Metformin, Ibuprofen
CYP2C9 (1OG5) — positives: Sulfaphenazole (Ki=300nM), Fluconazole (Ki=7µM) (Note: Warfarin is a CYP2C9 substrate, not an inhibitor — it has been removed from the positive control set)
CYP2C9 (1OG5) — negatives: Aspirin, Caffeine, Metformin
Validation gate: AUC ≥ 0.70 to enter the public panel. Targets that fail this gate are flagged as provisional in every report and excluded from the headline CTI; users see the per-target prediction but are warned.
Expanding the benchmark: v0 calibration uses 6-7 compounds per target. The 95% confidence interval on AUC at this n is ~±0.25 — wide. We are running a v1 calibration with 100 compounds per target sourced from Tox21 and curated PubChem bioassay sets (planned completion: 4-6 weeks). v1 will replace v0 thresholds when AUC CI tightens to ±0.05.
Composite Tox Index (CTI)
The CTI is a single weighted score in [0, 1] aggregating per-target normalized binding scores plus an ADMET penalty.
CTI = Σᵢ (weight_i × normalized_score_i) + ADMET_penalty
weight_hERG = 0.35
weight_CYP3A4 = 0.25
weight_CYP2D6 = 0.20
weight_CYP2C9 = 0.20
ADMET_penalty = Σ (penalty_j × flag_j) for {logp, mw, hbd, hba, pains, rot_bonds, sa_score}
normalized_score_i = clamp((reference_i − raw_i) / (reference_i − random_baseline), 0, 1)
reference_i = published-inhibitor reference score (kcal/mol) for target i
random_baseline = -4.5 kcal/mol (typical weak-binder noise)
Risk classification bands and action thresholds:
| Level | CTI | Interpretation |
|---|---|---|
| LOW | 0.00 – 0.25 | No significant predicted binding to safety targets. |
| MODERATE | 0.25 – 0.45 | Weak binding detected; monitor in follow-up assays. |
| HIGH | 0.45 – 0.65 | Significant binding to one or more targets; in-vitro validation recommended. |
| CRITICAL | 0.65 – 1.00 | Strong binding to multiple targets; high-priority safety concern. |
Special rule: If hERG normalized score ≥ 0.7, the report automatically elevates risk to at least HIGH regardless of the composite — cardiac liability dominates safety triage.
Confidence & replicates
Every prediction reports both where the model thinks the compound binds and how confident that prediction is. Two independent signals:
- Replicate variance. N=3 (default) independent runs with different random seeds. We report mean ± standard deviation and a 95% confidence interval (Wilson interval at small n). Wide intervals indicate the model is uncertain and the prediction should be weighted accordingly.
- pLDDT (predicted local distance difference test). Per-residue confidence in the binding pocket geometry. Bands: HIGH ≥ 0.70, MODERATE 0.55–0.70, LOW < 0.55. A high binary probability with low pLDDT means the model is confidently calling a low-confidence pocket — discount accordingly.
Known limitations
This list is exhaustive to our knowledge. If you find a limitation we don't list, we want to know.
- Pre-screening tool only. ToxScreen does not replace patch-clamp hERG, microsomal CYP IC50, in-vivo pharmacokinetics, or any FDA / EMA / PMDA-mandated assay. It is intended for triage before wet-lab spend.
- Calibration n is small. v0 calibration uses 6-7 compounds per target. AUC has wide confidence intervals at this n. Treat as directional, not definitive, until v1 lands.
- Memorization risk. Boltz-2 was trained on PDB. For well-studied targets, predictions on near-training-set compounds may overstate accuracy. Reports flag predictions where Tanimoto distance to nearest training compound is < 0.4 (planned for v0.2).
- Domain of applicability. Predictions outside the chemical space of the calibration set may not generalize. We do not currently bound predictions by Tanimoto-to-training-distribution distance — this is on the v1 roadmap.
- Frequent hitters / aggregators. Some compound classes (notably certain flavonoids and PAINS substructures) score positive on most targets. We flag PAINS hits explicitly; aggregation counter-screen is recommended for any HIGH/CRITICAL scaffold.
- Stereochemistry sensitivity. Enantiomers can bind very differently. We canonicalize SMILES with RDKit but do not enforce stereochemistry. Submit explicit stereo for chiral compounds.
- Single-pose binding. Boltz-2 returns one (highest-confidence) pose per pair. Compounds with multiple binding modes may be misclassified.
- No metabolite prediction. ToxScreen scores the parent compound. CYP metabolites (often more or less toxic than parent) are not predicted.
- No species-specificity. hERG and CYP models are human; we do not currently offer dog/rat/monkey/zebrafish equivalents.
- No regulatory acceptance. No regulator (FDA, EMA, PMDA, ICH) has formally accepted Boltz-2 or ToxScreen output as a substitute for any wet-lab assay. ToxScreen output should be treated as supplementary only.
Reproducibility
Every report ships a methodology stamp footer with the exact configuration in use at scoring time:
Boltz-2 NIM · 3 replicates · GX10 GB10 GPU Calibration md5: (8-char hash of the active threshold config) Frozen: (UTC timestamp at scoring time)
Re-running the same SMILES against the same calibration config will produce identical replicate-mean scores (deterministic up to floating-point ε in the affinity head). Replicate variance comes from the diffusion model's stochastic sampling.
References
- Passaro S, et al. Boltz-2: Towards accurate and efficient binding affinity prediction. bioRxiv 2025.
- Pires DEV, Blundell TL, Ascher DB. pkCSM: predicting small-molecule pharmacokinetic and toxicity properties using graph-based signatures. J. Med. Chem. 2015.
- Daina A, Michielin O, Zoete V. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci. Rep. 2017.
- Vargas HM, et al. Time for a fully integrated nonclinical-clinical risk assessment to streamline QT prolongation liability determinations. Clin. Pharmacol. Ther. 2018.
- Shekhar S, et al. Limitations of structure-based affinity prediction for novel chemical space. 2025.
- Ji et al. The OECD QSAR Toolbox. Chemosphere 2023.
Privacy & data handling
- Your structures. Submitted SMILES are stored encrypted at rest (PostgreSQL TDE), associated only with your account, and never shared with third parties for training, marketing, or analytics.
- Deletion. Email hello@toxscreen.io with your account email and we delete all associated SMILES, reports, and compute records within 7 business days.
- Compute provider. Predictions run on our GPU infrastructure (currently GX10 / GB10 on Tailscale-private network). No third-party AI provider sees your structures.
- Data residency. Currently US-only. Contact us before submission if you have EU GDPR or APAC data-residency requirements.
- No tracking. We use first-party Plausible analytics for page views; no third-party trackers, no advertising cookies, no fingerprinting.
- Account deletion. Self-service in dashboard → Account → Delete account. Wipes all jobs, reports, payment records (subject to legal retention requirements for invoicing).
Legal disclaimers
RESEARCH USE ONLY. ToxScreen output is for research and informational purposes only. It is not approved or cleared by any regulatory authority for any clinical, diagnostic, prophylactic, or therapeutic use.
NOT MEDICAL ADVICE. ToxScreen does not provide medical advice, diagnosis, or treatment recommendations. Compounds flagged "low risk" may be unsafe. Compounds flagged "high risk" may be safe in a given indication or dose.
NOT A REGULATORY SUBMISSION. ToxScreen output cannot replace IND-enabling toxicology studies, GLP packages, ICH M7 mutagenicity assessments, or any FDA/EMA/PMDA-mandated assay. Do not submit ToxScreen reports as primary safety evidence to any regulator.
NO WARRANTY. ToxScreen is provided "as is" without warranty of any kind, express or implied, including without limitation warranties of merchantability, fitness for a particular purpose, accuracy, completeness, or non-infringement. Computational predictions carry inherent uncertainty and may not generalize to compounds outside the calibration benchmark.
LIMITATION OF LIABILITY. To the maximum extent permitted by law, the operators of ToxScreen disclaim any liability for direct, indirect, incidental, consequential, punitive, or special damages arising out of or in connection with the use of ToxScreen output, including but not limited to safety, clinical, regulatory, or commercial decisions made on the basis of any prediction.
USER OBLIGATIONS. Users represent and warrant that they will not use ToxScreen output as the sole or primary basis for any decision affecting human or animal exposure, that they will independently validate any prediction with appropriate experimental assays before acting on it, and that they will not rely on ToxScreen output to satisfy any regulatory, ethical, or contractual obligation.
INDEMNIFICATION. Users agree to indemnify and hold harmless the operators of ToxScreen from any claims, damages, losses, or expenses (including reasonable attorneys' fees) arising out of or related to the user's use of ToxScreen output, including any reliance on predictions in safety, clinical, or commercial decisions.
JURISDICTION. This service is operated from the United States. Use is governed by US law. Users outside the United States agree that any disputes will be resolved in the courts of the United States.
For full Terms of Service and Privacy Policy, see Terms and Privacy.
Last updated: . Methodology evolves as we add targets, expand calibration, and incorporate user feedback. Material changes are versioned and logged at git history.