
Identification of the Esophageal Mucosa in POEM Procedures
Computer Vision–Based Identification of Esophageal Mucosa in POEM (Peroral Endoscopic Myotomy) Procedures

Peroral Endoscopic Myotomy (POEM) is a minimally invasive endoscopic procedure used to treat esophageal achalasia. Unlike laparoscopic Heller myotomy, POEM is performed entirely through an endoscope introduced into the esophageal lumen, where a submucosal tunnel is created to access and cut the circular muscle layer.
A critical step in POEM is the continuous identification and preservation of the esophageal mucosa during submucosal dissection and myotomy. Inadvertent injury to the mucosal layer can lead to leaks, infection, and other serious complications.
The aim of this project is to develop a computer vision system operating on POEM endoscopic video to assist in identifying, segmenting, or highlighting the esophageal mucosa in real time or offline analysis. Using real endoscopic video data, students will design, implement, and evaluate vision-based methods that distinguish mucosa from submucosal and muscular layers under challenging visual conditions.
Mentor Details:
Prof. Yoav Mintz
Mentor Details:
Requirments:
Students will aim to:
Analyze POEM endoscopic video data and characterize visual tissue cues
Design a computer vision pipeline for mucosa identification
Apply deep learning–based models (e.g., CNNs, Vision Transformers) to endoscopic images or video
Incorporate temporal information to improve consistency across frames
Evaluate system performance using appropriate computer vision metrics
Problem Statement
Endoscopic POEM videos present unique challenges for computer vision systems:
Highly variable illumination and strong specular reflections
Fluid, bubbles, smoke, and debris in the endoscopic field
Narrow field of view with frequent camera rotation
Rapid tissue deformation during submucosal tunneling
Subtle visual differences between mucosa, submucosa, and muscle
In POEM, the mucosa appears visually similar to surrounding tissue and may thin or stretch during the procedure. The problem is to design a vision-based system that can reliably identify the mucosal layer across frames or video sequences, despite noise, motion, and anatomical variability.
Project Objectives
Students will aim to:
Analyze POEM endoscopic video data and characterize visual tissue cues
Design a computer vision pipeline for mucosa identification
Apply deep learning–based models (e.g., CNNs, Vision Transformers) to endoscopic images or video
Incorporate temporal information to improve consistency across frames
Evaluate system performance using appropriate computer vision metrics
Technical Scope
The project may include one or more of the following components:
Image or video segmentation of esophageal mucosa
Multi-class tissue classification (mucosa vs submucosa vs muscle)
Temporal modeling using optical flow, 3D CNNs, or recurrent architectures
Weakly supervised or self-supervised learning (if pixel-level annotations are limited)
Robustness analysis under varying lighting, fluids, and camera motion
Required Knowledge and Prerequisites
Core Requirements
Understanding of fundamental computer vision concepts
Experience with convolutional neural networks (CNNs)
Familiarity with deep learning frameworks (e.g., PyTorch, TensorFlow)
Ability to work with image and video datasets
Recommended Background
Endoscopic image analysis
Semantic segmentation architectures (e.g., U-Net, DeepLab)
Video modeling techniques
Performance metrics (IoU, Dice, precision, recall)
No prior clinical or endoscopic knowledge is required; essential procedural background will be provided.
Project Difficulty and Expected Level
Vision complexity: High (endoscopic video with severe visual artifacts)
Modeling complexity: Moderate to high
Domain knowledge: Low (clinical concepts taught during the project)
This project is well-suited for:
Teams of 2–4 students
Expected Outcomes
A working computer vision prototype for mucosa identification in POEM videos
Quantitative evaluation on real endoscopic video data
Analysis of failure cases (bleeding, bubbles, extreme deformation)
Well-documented codebase and a technical report