
NLP -Ladino
Greek Jewish Holocaust & Heritage AI: Victims/Survivors Database + Ladino LLM for Archive Accession & Retrieval

Create an AI-powered archival system for Greek-Jewish Holocaust victims and survivors with Ladino-aware LLMs.
Click on the link to the video to better understand what is required and what you will receive from the project.
Mentor Details:
Prof. Ilan Tsarfaty
Mentor Details:
Requirments:
Build structured databases and semantic retrieval workflows for archival discovery.
Problem Statement
Historical archives are fragmented, multilingual, and difficult to search.
Project Objectives
Build structured databases and semantic retrieval workflows for archival discovery.
Technical Scope
Build an “AI librarian” workflow that turns historical Greek-Jewish community archives into a structured, searchable database—then add a Ladino-focused LLM layer for semantic search and question answering over victims/survivors records and community documents (1920–1995; multi-language and multi-script, including Ladino/Solitreo/Rashi/Hebrew/Greek/French).
Required Knowledge and Prerequisites
Core Requirements:
Python, NLP basics.
Recommended Background:
Digital humanities, multilingual NLP.
Project Difficulty and Expected Level
Overall Difficulty: varies
This project is well-suited for:
Teams of 2–4 students
Expected Outcomes
End-to-end applied GenAI for digital humanities: OCR + transliteration + entity extraction + retrieval pipelines + LLM evaluation.
Real database engineering skills: ingestion, schema design, indexing, de-duplication, provenance tracking, and safe access patterns for researchers and the public.
A high-impact project with clear social value: preserving a disappearing language and enabling respectful, accurate access to community history at scale.