verified Verified Information • Last Updated Mar 2026

Build Multimodal Generative AI Applications

Ready to level up your GenAI skills? Step into the exciting world of multimodal AI, where language, images, and speech come together to build smarter, more interactive applications. In this hands-on course, you’ll learn how to build systems that work across multiple modalities, from creating AI-powered storytellers and meeting assistants to developing image captioning tools and video generation apps. You’ll gain experience with real-world tools like IBM’s Granite, OpenAI’s Whisper, Sora and DALL·E, Meta’s Llama, Mistral’s Mixtral, and Gradio. Plus, you'll explore multimodal search, question answering, and retrieval systems that combine text, speech, and visual data. By the end of the course, you’ll be able to design and build full-stack multimodal AI solutions using Python and frameworks like Flask and Gradio. If you’re looking to gain in-demand skills for building the next generation of AI applications, enroll today and power up your AI career!
Duration 4 Months
Institution IBM
Format Online

Eligibility Criteria

school

Academic Foundation

A recognized Bachelor’s degree or high school equivalent required for admission into IBM .

language

Language Proficiency

English proficiency required. IELTS, TOEFL, or standard medium-of-instruction certificates accepted.

Detailed Fees Breakdown

Base Tuition Fee $366
Total Est. Investment $366

Scholarships and early-bird waivers may apply. Contact admissions for exact institutional fees.

Academic Trajectory

Program Outcome

Graduates of the Build Multimodal Generative AI Applications program at IBM are equipped with global perspectives, ready to excel in international markets and top-tier career opportunities.

headset_mic
Get In Touch