
PatchCamelyon (PCam)
PatchCamelyon (PCam): Histopathology Cancer Detection

This project is based on the PatchCamelyon (PCam) dataset, a large-scale benchmark dataset derived from histopathological scans of lymph node sections. The dataset is designed for binary image classification, where the task is to determine whether a small tissue patch contains metastatic cancer cells. PCam is widely used in machine learning and digital pathology research due to its clean labels and standardized splits.
Mentor Details:
Dr. Chen Sagiv
Mentor Details:
Requirments:
- Load and explore the PCam dataset and its predefined train/validation/test splits.
- Train convolutional neural networks to classify image patches as cancerous or non-cancerous.
- Evaluate model performance using standard classification metrics.
- Analyze model behavior and generalization on histopathology data.
Problem Statement
Detecting metastatic cancer in histopathology slides is a critical but labor-intensive task for pathologists. Manual examination of whole-slide images is time-consuming and subject to inter-observer variability. Automated patch-level classification provides a scalable way to support cancer detection and serves as a benchmark problem for developing robust medical image analysis models.
Project Objectives
- Load and explore the PCam dataset and its predefined train/validation/test splits.
- Train convolutional neural networks to classify image patches as cancerous or non-cancerous.
- Evaluate model performance using standard classification metrics.
- Analyze model behavior and generalization on histopathology data.
Technical Scope
- Handling large image datasets stored in HDF5 format
- Image preprocessing and normalization
- Convolutional neural network (CNN) training and evaluation
- Binary classification and performance analysis
- Optional extensions: data augmentation, transfer learning, explainability methods
Required Knowledge and Prerequisites
Required Knowledge and Prerequisites
Core Requirements:
- Python programming
- Basic machine learning concepts
- Familiarity with deep learning frameworks (PyTorch or TensorFlow)
Recommended Background:
- Computer vision and CNN architectures
- Medical or histopathological imaging
- Experience working with large datasets
Project Difficulty and Expected Level
Overall Difficulty: varies
This project is well-suited for:
Teams of 1–3 students
This project can also be done coding free with the DeePathology STUDIO.
Expected Outcomes
- A trained CNN model for cancer detection on PCam patches
- Quantitative evaluation results on the test set
- A reproducible training and evaluation pipeline
- Insights into challenges of histopathology image classification