verified Verified Information • Last Updated Mar 2026

Preprocessing Unstructured Data for LLM Applications

Enhancing a RAG system’s performance depends on efficiently processing diverse unstructured data sources. In this course, you’ll learn techniques for representing all sorts of unstructured data, like text, images, and tables, from many different sources and implement them to extend your LLM RAG pipeline to include Excel, Word, PowerPoint, PDF, and EPUB files. 1. How to preprocess data for your LLM application development, focusing on how to work with different document types. 2. How to extract and normalize various documents into a common JSON format and enrich it with metadata to improve search results. 3. Techniques for document image analysis, including layout detection and vision transformers, to extract and understand PDFs, images, and tables. 4. How to build a RAG bot that is able to ingest different documents like PDFs, PowerPoints, and Markdown files. Apply the skills you’ll learn in this course to real-world scenarios, enhancing your RAG application and expanding its versatility.
Duration 4 Months
Institution DeepLearning.AI
Format Online

Eligibility Criteria

school

Academic Foundation

A recognized Bachelor’s degree or high school equivalent required for admission into DeepLearning.AI.

language

Language Proficiency

English proficiency required. IELTS, TOEFL, or standard medium-of-instruction certificates accepted.

Detailed Fees Breakdown

Base Tuition Fee $378
Total Est. Investment $378

Scholarships and early-bird waivers may apply. Contact admissions for exact institutional fees.

Academic Trajectory

Program Outcome

Graduates of the Preprocessing Unstructured Data for LLM Applications program at DeepLearning.AI are equipped with global perspectives, ready to excel in international markets and top-tier career opportunities.

headset_mic
Get In Touch