Scoring Systems Guide

The Molecular AOP Builder uses two distinct scoring stages, kept deliberately separate:

  1. Suggestion ranking — orders pathways / GO terms by BioBERT semantic similarity to the Key Event description. Run automatically when a curator selects a KE.
  2. Confidence assessment — a 4-question rubric the curator answers after picking a target; it produces the High / Medium / Low confidence label stored with the mapping.

This guide describes both. For the YAML knobs and exact defaults, see docs/SCORING_CONFIG.md in the repository.

Suggestion ranking — BioBERT semantic similarity

BioBERT is a transformer language model pre-trained on biomedical text (PubMed abstracts and PMC full-text articles). It captures biological meaning beyond simple word matching — for example, it treats "apoptosis" and "programmed cell death" as closely related even though they share no tokens.

How embeddings work

   "Increase, CYP2E1 expression"
              │
              ▼
    ┌─────────────────┐
    │    BioBERT      │
    │  Neural Network │
    └────────┬────────┘
             │
             ▼
    ┌─────────────────┐
    │  768-dimensional │
    │  embedding vector│
    │  [0.23, -0.15,  │
    │   0.87, ...]    │
    └─────────────────┘
        

Pre-computed embeddings

To keep suggestions fast, KE and pathway / GO embeddings are pre-computed once and cached on disk:

Data Count File
Key Event embeddings ~1,561 data/ke_embeddings.npy
WikiPathways title embeddings ~1,012 data/pathway_title_embeddings.npy
GO BP term embeddings ~30,000 data/go_embeddings.npz

Similarity calculation

cosine_similarity = dot(KE_embedding, target_embedding) / (norm(KE) × norm(target)) normalized_score = (cosine_similarity + 1) / 2 transformed_score = normalized_score ^ power_exponent

Score transformation

Raw BioBERT cosine scores cluster in the 0.8–0.95 range because biomedical texts rarely have strongly negative similarity. A power transformation spreads them out for clearer differentiation between candidates:

Raw cosine Normalized After transform (^4.0)
0.90 0.95 0.81
0.80 0.90 0.66
0.70 0.85 0.52
0.50 0.75 0.32
Directionality is removed. Before computing the KE embedding, directional terms ("increase", "decrease", "activation", "inhibition") are stripped from the title. This means "Increase CYP2E1" and "Decrease CYP2E1" produce the same suggestion list — the pathway biology is the same; the direction is captured separately by the connection-type field on the mapping.
Small ontology post-combine boost. After the BioBERT score is computed, candidates whose WikiPathways ontology annotations align with the AOPs the KE belongs to receive a small additive boost. This is purely a tie-breaker and is configurable via pathway_suggestion.ontology_post_combine_boost.

Gene-overlap chip

Each suggestion card shows a small Genes: N/M chip: N = number of KE-associated genes also present in the candidate pathway; M = total KE-associated genes from AOP-Wiki.

Informational only. The gene-overlap chip helps curators sanity-check a suggestion against shared biology, but it does not influence the ranking. Two candidates with the same BioBERT score will appear in the same order regardless of how many genes they share with the KE.

Hovering the chip reveals the actual HGNC gene symbols that overlap, so the curator can spot situations like a single housekeeping gene driving an artificially high overlap.

Confidence assessment workflow

When the curator submits a new KE-WP mapping, a 4-question rubric converts their judgement into a High / Medium / Low confidence label:

Question 1: Relationship type

"What is the relationship between the pathway and Key Event?"

Option Meaning
Causative Pathway activity leads to the Key Event (Pathway → KE)
Responsive Key Event triggers pathway activation (KE → Pathway)
Bidirectional Both directions apply
Unclear Relationship exists but direction uncertain

This question populates the connection_type field but does not affect the confidence score.

Question 2: Evidence basis (0–3 points)

"What is the basis for this mapping?"

Option Points When to select
Known connection 3 Published evidence directly supports this mapping
Likely connection 2 Strong inference from your domain knowledge
Possible connection 1 Plausible hypothesis but uncertain
Uncertain connection 0 No clear basis for the connection

Answer based on what you already know; no new literature search is required at this stage.

Question 3: Pathway specificity (0–2 points)

"How specific is the pathway to this Key Event?"

Option Points Example
KE-specific 2 "CYP2E1 metabolism" pathway for a KE about CYP2E1
Includes KE 1 "Xenobiotic metabolism" pathway that includes CYP2E1
Loosely related 0 Very broad pathway only tangentially related

Question 4: KE coverage (0–1.5 points)

"How much of the KE mechanism is captured by the pathway?"

Option Points Meaning
Complete mechanism 1.5 Pathway fully represents the KE
Key steps only 1.0 Major elements captured, some missing
Minor aspects 0.5 Only peripheral aspects represented

Biological-level bonus (+1 point)

KEs at molecular, cellular, or tissue levels receive a +1 bonus because they are closer to pathway mechanisms than organ- or individual-level KEs:

Biological level Bonus
Molecular, Cellular, Tissue +1.0
Organ, Individual, Population +0.0

Final score and thresholds

Final = Evidence (0–3) + Specificity (0–2) + Coverage (0–1.5) + Bio bonus (0–1) Maximum = 7.5
Score range Confidence Interpretation
≥ 5.0 High Strong evidence, good specificity and coverage
2.5 – 4.9 Medium Moderate evidence with some limitations
< 2.5 Low Weak evidence or poor pathway fit

Worked examples

High-confidence example:

  • Known connection: 3
  • KE-specific pathway: 2
  • Complete mechanism: 1.5
  • Molecular level: +1
  • Total: 7.5 → High

Medium-confidence example:

  • Likely connection: 2
  • Includes KE: 1
  • Key steps only: 1
  • Organ level: no bonus
  • Total: 4.0 → Medium
KE-GO assessment. KE-GO mapping uses the same rubric structure with GO-specific connection types (positive / negative regulation rather than causative / responsive).

Tuning parameters

All thresholds and weights are configurable via config/scoring_config.yaml. Changes require a Flask restart to take effect.

Common tuning scenarios

Goal Parameter Change
Show more / fewer candidates dynamic_thresholds.base_threshold Lower for more (e.g. 0.15 → 0.10)
Spread BioBERT scores more embedding_based_matching.score_transformation.power_exponent Raise (e.g. 4.0 → 5.0)
Stronger AOP-aligned tie-breaker pathway_suggestion.ontology_post_combine_boost Raise weights inside this block
Lenient confidence threshold ke_pathway_assessment.confidence_thresholds.high Lower (e.g. 5.0 → 4.5)

Applying changes

# 1. Edit configuration
nano config/scoring_config.yaml

# 2. Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('config/scoring_config.yaml'))"

# 3. Restart Flask
pkill -f "python.*app.py" && python app.py &

# 4. Hard-reload the browser (Ctrl+Shift+R)

For full parameter reference, see docs/SCORING_CONFIG.md.