About PathoIntern

Phased Approach

A two-phase strategy: validate the core concept first, then extend to full whole-slide image processing.

Phase IActive · 6 Weeks

MVP — Patch-Based Triage

Validate the core concept with pre-extracted patches and public datasets.

Technical Specifications

Input typePre-extracted patches (PNG/JPG)
DatasetKaggle Parasite Dataset
Embedding modelCTransPath — 768-dim
Model accessPublic weights, no approval required
Scoring methodk-NN anomaly detection vs normal baseline
ViewerStatic image display (patch-level)
Data volume< 10,000 patches (MVP scale)
DeploymentDocker Compose (local / dev)

Capabilities

  • Patch image upload (single or batch)
  • CTransPath embedding generation and pgvector storage
  • Criticality scoring with 4-tier classification
  • Top-k similar patch retrieval with labels
  • Prioritized worklist dashboard
  • Pathologist verdict submission (agree/disagree/modify)
  • Comprehensive audit log with full attribution
  • Non-dismissible disclaimer on every view
Phase IIPlanned · 8–12 Weeks

Full WSI Integration

Extend to whole-slide images with OpenSlide, deep-zoom viewer, and heatmap overlays.

Technical Specifications

Input typeWhole-Slide Images (NDPI, SVS, TIFF)
DatasetInstitutional library (~150 slides, ~200 GB)
Embedding modelUNI — 4096-dim (HuggingFace)
Model accessRequires HuggingFace access approval
Scoring methodSlide-level aggregated anomaly scoring
ViewerOpenSeadragon deep-zoom viewer + heatmap overlay
Data volumeProduction scale (continuous ingestion)
DeploymentContainerized cloud deployment

Capabilities

  • Whole-slide image upload and tile extraction
  • Patch-level UNI embeddings aggregated to slide-level scores
  • OpenSeadragon deep-zoom interactive viewer
  • Risk heatmap overlay on whole-slide view
  • Slide-level worklist with region annotation
  • Pathologist learning loop (verdict → model update)
  • LIS/EMR integration API
  • Multi-institution deployment architecture

Side-by-Side Comparison

DimensionPhase I (MVP)Phase II
Timeline6 weeks (1 sprint/week)8–12 additional weeks
Input granularityPatch (cropped region)Whole slide image
Embedding dimension768-dim (CTransPath)4096-dim (UNI)
Primary goalValidate the core triage conceptProduction-grade clinical assistant
Demo qualityHigh — fully functional end-to-endClinical-grade UX
Academic rigorModerate (single dataset)High (multi-dataset + metrics)
Data requirementsPublic Kaggle datasetInstitutional WSI library
GPU requirementOptional (CPU fallback works)Required for WSI tile processing

Why Two Phases?

Whole-slide image processing requires institutional data access, high-compute infrastructure, and regulatory pathway planning that cannot be completed in a 6-week academic project. Phase I validates the core triage concept — that AI pattern similarity can reliably stratify patches by anomaly risk — with publicly available data and a manageable technical stack.

Phase II then extends the validated concept to clinical-grade deployment, incorporating the feedback collected from Phase I pathologist verdicts and the additional infrastructure required for whole-slide images.