Orchestrating 100+ AI Models Behind a Single Chat
How Musa AI Studio routes a Russian-language conversation across Suno, Veo3, Kling, Nano Banana, Recraft, and Claude — and keeps the experience simple for non-tech users.
Musa AI Studio is a Russian-language creative workspace where users write poems, songs, and stories and generate images, music, and video. There are over a hundred generation endpoints behind the scenes — Suno V5.5, Gemini Omni, Veo3, Kling, Nano Banana, GPT Image 2, Recraft, HappyHorse, plus Claude doing all the language work. The UI is a single chat. The user does not pick a model.
Why one chat beats a dashboard of tabs
Non-tech users do not want to learn what Suno is. They want to type ‘sing me a birthday song for my mom, sad but warm, in my voice’ and get a song. Every dashboard with model tabs is a quiz they were not asked to take. A chat surface lets the product own the routing.
The router lives in Claude
Intent classification, parameter extraction, and tool selection all live in a Claude prompt with a tool schema for each upstream capability — generate_song, generate_image, generate_video, edit_image, clone_voice, and so on. Claude reads the conversation, picks the tool, fills the arguments, and writes the response copy. There is no hand-rolled NLU layer in between.
This pushes a lot of edge-case handling into prompt engineering. The payoff is that adding a new generation backend is a few lines: a new tool definition, a new handler that calls KIE, and an updated example in the system prompt.
KIE as the upstream proxy
Talking to ten different model vendors directly would mean ten different SDKs, ten different auth schemes, ten different webhook formats. KIE collapses that into one HTTP API across Suno, Gemini Omni, HappyHorse, Veo3, Kling, Nano Banana, GPT Image 2, and Recraft. Every generation is fire-and-poll: submit a job, get an id, poll until done or webhook back.
The job state machine
- Claude tool call → enqueue job in Postgres with status=pending
- Worker submits to KIE, status=running, kie_job_id stored
- Webhook (or polling fallback) flips status=succeeded with output URLs
- Outputs are pulled into Cloudflare R2 so vendor URLs cannot rot
- A new chat message is composed referencing the R2 URLs
- If a Telegram chat is linked, the file is also delivered there
Every step is idempotent and addressable by job id. That matters because Suno video and Veo3 sometimes take 3–5 minutes, and a user who closes the tab still needs to come back to a finished song in their chat history.
What I would do differently
I would have built the R2 mirror from day one rather than week three. Two vendors changed their CDN URL format in the first two months, breaking every old message that linked directly to them. Treat vendor URLs as ephemeral; treat your own storage as the source of truth.