verified
Verified Information • Last Updated Mar 2026
Build Multimodal Generative AI Applications
Ready to level up your GenAI skills? Step into the exciting world of multimodal AI, where language, images, and speech come together to build smarter, more interactive applications.
In this hands-on course, you’ll learn how to build systems that work across multiple modalities, from creating AI-powered storytellers and meeting assistants to developing image captioning tools and video generation apps.
You’ll gain experience with real-world tools like IBM’s Granite, OpenAI’s Whisper, Sora and DALL·E, Meta’s Llama, Mistral’s Mixtral, and Gradio. Plus, you'll explore multimodal search, question answering, and retrieval systems that combine text, speech, and visual data.
By the end of the course, you’ll be able to design and build full-stack multimodal AI solutions using Python and frameworks like Flask and Gradio.
If you’re looking to gain in-demand skills for building the next generation of AI applications, enroll today and power up your AI career!
Duration
4 Months
Institution
IBM
Format
Online
Eligibility Criteria
school
Academic Foundation
A recognized Bachelor’s degree or high school equivalent required for admission into IBM .
language
Language Proficiency
English proficiency required. IELTS, TOEFL, or standard medium-of-instruction certificates accepted.
Detailed Fees Breakdown
Base Tuition Fee
$366
Total Est. Investment
$366
Scholarships and early-bird waivers may apply. Contact admissions for exact institutional fees.
Academic Trajectory
Program Outcome
Graduates of the Build Multimodal Generative AI Applications program at IBM are equipped with global perspectives, ready to excel in international markets and top-tier career opportunities.