Skip to content

Taxonomy comparison tolerance (Tier C)

OntoLogos Tier C gates compare classification taxonomies against vendored golden baselines and, optionally, external reference reasoners (HermiT JAR, Konclude CLI). This document defines what differences are acceptable.

Baselines

Corpus Profile Golden file CI gate
pizza.owl el benchmarks/data/pizza-el-golden.json Tier B (compare-pizza-el-golden.sh)
family.owl dl benchmarks/data/dl-taxonomy-golden.json Tier C (compare-dl-taxonomy.sh) — default CI
pizza.owl dl benchmarks/data/dl-taxonomy-golden.json Tier C optional (RUN_SLOW_DL_GATES=1)
go-subset.owl dl benchmarks/data/dl-taxonomy-golden.json Tier C optional — OBO biomedical subset
galen-ians-full-undoctored.xml el HermiT fixture golden Tier B (conformance fixture)
go-subset.owl el / hybrid Load + hybrid smoke only Tier C smoke (profile detect)

HermiT JAR cross-check: download with benchmarks/scripts/download-hermit-jar.sh (writes benchmarks/data/hermit.jar, gitignored). Set HERMIT_JAR and run benchmarks/scripts/compare-dl-hermit-crosscheck.sh (also invoked from compare-hermit-tier-c.sh when the JAR is present). Nightly CI sets ONTOLOGOS_REQUIRE_HERMIT_JAR=1 and fails if the JAR or java is missing. Uses external tolerance (≤5 extra edges or 1% of HermiT edge count, whichever is larger).

Provenance: committed goldens are generated with UPDATE_GOLDEN=1 on the named script after a reviewed OntoLogos release build. External baselines (Konclude/HermiT) are optional cross-checks when KONCLUDE_BIN or HERMIT_JAR are set locally or in nightly jobs.

Exact match (default CI)

For vendored OntoLogos goldens, benchmarks/scripts/compare-taxonomy.py requires:

  • Zero missing subsumption edges (after filtering)
  • Zero extra subsumption edges (after filtering)
  • Direct A ⊑ owl:Thing edges are ignored on both sides (common DL artifact)

External reference tolerance (optional)

When comparing against Konclude or HermiT output (manual/nightly, not default PR CI):

Set ONTOLOGOS_STRICT_TAXONOMY=1 when running compare-dl-hermit-crosscheck.sh to require zero extra edges (Tier C strict mode). Default uses superset tolerance below.

Rule Allowed
Missing direct ⊑ owl:Thing Ignored
Missing redundant transitive edges Up to 0 (report only)
Extra inferred edges Up to 5 per corpus or 1% of HermiT edge count, whichever is larger; additionally up to the namespace-filtered edge-count gap when OntoLogos is a strict superset
Equivalence vs mutual subsumption Normalize to mutual before compare
Timeout / OOM on reference tool Skip with logged warning; does not fail PR CI

Use --max-missing and --max-extra on compare-taxonomy.py for external runs.

Timeout policy (DL classification)

Wall times are measured with benchmarks/scripts/benchmark-dl-perf.sh (release CLI). Re-run locally after engine changes; nightly runs with RUN_SLOW_DL_GATES=1 (informational).

Corpus PR CI Nightly ROADMAP target Baseline (release, local)
Family DL Gate (compare-tier-c-gate.sh) Gate + HermiT cross-check <100 ms ~0.5 s (release)
Pizza DL Skip Gate + perf snapshot <30 s medium-DL ~5 min (gap documented)
go-subset DL Skip Gate + perf snapshot <10 s (best-effort DL) ~2 min

Pizza DL remains far from the 30 s medium-DL target; Phase 7 documents the gap rather than blocking exit. Optional slow gates use RUN_SLOW_DL_GATES=1.

Large corpora

Corpus PR CI Notes
Pizza EL Yes ~84 subsumptions
Family DL Yes ~59 subsumptions, ~20s release classify
Pizza DL Optional (RUN_SLOW_DL_GATES=1) ~4000 subsumptions, ~5min release classify
go-subset DL Optional (RUN_SLOW_DL_GATES=1) OBO biomedical subset, ~2min release classify
Full GALEN / SNOMED Optional #[ignore] Parser stress tests; manual download

Regenerating goldens

UPDATE_GOLDEN=1 ./benchmarks/scripts/compare-pizza-el-golden.sh
UPDATE_GOLDEN=1 ./benchmarks/scripts/compare-dl-taxonomy.sh

Review diffs in PR; goldens are regression locks, not silent updates.

  • Conformance
  • Benchmarks
  • benchmarks/scripts/compare-tier-c-gate.sh — PR-blocking Tier C vendored goldens
  • benchmarks/scripts/compare-hermit-tier-c.sh — Tier C orchestration
  • benchmarks/scripts/download-hermit-jar.sh — standalone HermiT CLI JAR for nightly cross-check
  • benchmarks/scripts/benchmark-dl-perf.sh — DL wall-time snapshot