Generate images, videos, speech, and music using a single varg_xxx API key. No per-provider keys needed.
bun install vargai @vargai/gateway ai# .env
VARG_API_KEY=varg_xxxThat's it. The gateway pools provider keys (fal, ElevenLabs, Higgsfield, Replicate) server-side. You only need one key.
Two ways to use the gateway:
Direct access to gateway REST endpoints. Returns job objects you poll/wait on.
import { VargClient } from "@vargai/gateway";
const client = new VargClient({
apiKey: process.env.VARG_API_KEY!,
});Implements the Vercel AI SDK provider interface. Use with generateVideo, generateImage, etc.
import { createVarg } from "@vargai/gateway";
const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });const job = await client.createImage({
model: "nano-banana-pro",
prompt: "sunset over mountains, dramatic lighting",
aspect_ratio: "16:9",
});
const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);
// https://s3.varg.ai/o/job_xxx.pngimport { generateImage } from "vargai/ai";
const { image } = await generateImage({
model: varg.imageModel("nano-banana-pro"),
prompt: "sunset over mountains, dramatic lighting",
aspectRatio: "16:9",
});const job = await client.createVideo({
model: "kling-v3",
prompt: "ocean waves crashing on rocks, cinematic slow motion",
duration: 5,
aspect_ratio: "16:9",
});
const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);
// https://s3.varg.ai/o/job_xxx.mp4import { generateVideo } from "vargai/ai";
const { video } = await generateVideo({
model: varg.videoModel("kling-v3"),
prompt: "ocean waves crashing on rocks, cinematic slow motion",
aspectRatio: "16:9",
duration: 5,
});Upload an image first, then animate it:
// Upload
const file = await Bun.file("./photo.jpg").arrayBuffer();
const uploaded = await client.uploadFile(new Blob([file]), "image/jpeg");
// Animate
const job = await client.createVideo({
model: "kling-v3",
prompt: "photo comes to life, character smiles and waves",
files: [{ url: uploaded.url }],
});
const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);const job = await client.createSpeech({
model: "eleven_multilingual_v2",
text: "Welcome to the future of AI video generation.",
voice: "rachel",
});
const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);
// https://s3.varg.ai/o/job_xxx.mp3Voices: rachel, josh, adam, sarah, domi, elli, antoni, arnold, sam
const job = await client.createMusic({
model: "music_v1",
prompt: "upbeat electronic, energetic, modern pop feel",
duration: 30,
});
const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);Use vargai/react to compose multi-scene videos with AI-generated images, video clips, voiceover, music, and captions — all stitched into a single MP4. Pass gateway models via the defaults option so every generation routes through the gateway.
import { render, Render, Clip, Image, Video } from "vargai/react";
import { createVarg } from "@vargai/gateway";
const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });
const character = Image({
prompt: "cute orange cat, big eyes, Pixar style",
model: varg.imageModel("nano-banana-pro"),
aspectRatio: "9:16",
});
await render(
<Render width={1080} height={1920}>
<Clip duration={5}>
<Video
prompt={{ text: "cat waves hello, bounces up and down", images: [character] }}
model={varg.videoModel("kling-v3")}
/>
</Clip>
</Render>,
{ output: "output/hello.mp4" },
);Run: bun run video.tsx
Set default models once so you don't repeat them on every element:
import { render, Render, Clip, Image, Video, Speech, Music, Captions } from "vargai/react";
import { createVarg } from "@vargai/gateway";
const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });
const voiceover = Speech({
voice: "rachel",
children: "Hey everyone! Check out this amazing sunset.",
});
await render(
<Render width={1080} height={1920}>
<Music prompt="chill lo-fi beats, relaxing" volume={0.15} />
<Clip duration={5}>
<Image prompt="sunset over ocean, golden hour, cinematic" zoom="in" />
</Clip>
<Clip duration={5} transition={{ name: "fade", duration: 0.5 }}>
<Video prompt="waves gently rolling onto sandy beach, warm light" />
</Clip>
<Captions src={voiceover} style="tiktok" color="#ffffff" />
</Render>,
{
output: "output/sunset.mp4",
defaults: {
video: varg.videoModel("kling-v3"),
image: varg.imageModel("nano-banana-pro"),
speech: varg.speechModel("eleven_turbo_v2"),
music: varg.musicModel("music_v1"),
},
},
);When defaults are set, elements without an explicit model prop use the default. You can still override per-element:
<Image prompt="high quality portrait" model={varg.imageModel("flux-pro")} />import { render, Render, Clip, Image, Video, Speech, Music, Captions, Title } from "vargai/react";
import { createVarg } from "@vargai/gateway";
const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });
// Create character once, reuse across scenes
const character = Image({
prompt: "friendly tech influencer, casual style, ring light",
model: varg.imageModel("soul"),
aspectRatio: "9:16",
});
const voiceover = Speech({
voice: "sarah",
children: "Three things you need to know about AI video generation.",
});
await render(
<Render width={1080} height={1920}>
<Music prompt="upbeat tech podcast intro" volume={0.1} />
<Clip duration={5}>
<Video
prompt={{ text: "person speaking to camera, friendly wave", images: [character] }}
/>
<Title position="bottom" color="#ffffff">3 AI Video Tips</Title>
</Clip>
<Clip duration={4} transition={{ name: "fade", duration: 0.5 }}>
<Image prompt="AI neural network visualization, glowing nodes" zoom="in" />
<Title position="center" color="#ffffff">Tip 1: Use Image-to-Video</Title>
</Clip>
<Clip duration={5} transition={{ name: "crossfade", duration: 0.8 }}>
<Video
prompt={{ text: "person smiling, nodding confidently", images: [character] }}
/>
</Clip>
<Captions src={voiceover} style="tiktok" color="#ffffff" activeColor="#FFD700" />
</Render>,
{
output: "output/tips.mp4",
defaults: {
video: varg.videoModel("kling-v3"),
image: varg.imageModel("nano-banana-pro"),
speech: varg.speechModel("eleven_turbo_v2"),
music: varg.musicModel("music_v1"),
},
},
);| Component | Purpose | Key Props |
|---|---|---|
<Render> |
Root container | width, height, fps |
<Clip> |
Time segment / scene | duration, transition, cutFrom, cutTo |
<Image> |
AI or static image | prompt, src, model, zoom, aspectRatio |
<Video> |
AI or source video | prompt, src, model, volume, cutFrom, cutTo |
<Speech> |
Text-to-speech | voice, model, children (text) |
<Music> |
Background music | prompt, src, model, volume, loop, ducking |
<Title> |
Text overlay | position, color, start, end |
<Subtitle> |
Subtitle text | backgroundColor |
<Captions> |
Auto-generated subs | src, style, color, activeColor |
<Overlay> |
Positioned layer | left, top, width, height |
<TalkingHead> |
Animated character | character, src, voice, model |
<Packshot> |
End card / CTA | background, logo, cta, blinkCta |
<Split> |
Side-by-side layout | direction |
<Slider> |
Before/after reveal | direction |
<Swipe> |
Tinder-style cards | direction, interval |
<Clip transition={{ name: "fade", duration: 0.5 }}>
<Clip transition={{ name: "crossfade", duration: 0.5 }}>
<Clip transition={{ name: "wipeleft", duration: 0.5 }}>
<Clip transition={{ name: "cube", duration: 0.8 }}><Image prompt="landscape" zoom="in" /> // Ken Burns zoom in
<Image prompt="landscape" zoom="out" /> // Zoom out
<Image prompt="landscape" zoom="left" /> // Pan left
<Image prompt="landscape" zoom="right" /> // Pan right<Captions src={voiceover} style="tiktok" /> // Word-by-word highlight
<Captions src={voiceover} style="karaoke" /> // Fill left-to-right
<Captions src={voiceover} style="bounce" /> // Words bounce in
<Captions src={voiceover} style="typewriter" /> // Typing effect| Ratio | Resolution | Platform |
|---|---|---|
9:16 |
1080x1920 | TikTok, Reels, Shorts |
16:9 |
1920x1080 | YouTube, Twitter |
1:1 |
1080x1080 | Instagram Feed |
await render(<Render>...</Render>, {
output: "video.mp4", // Save to file
cache: ".cache/ai", // Custom cache directory
defaults: { ... }, // Default models (see above)
concurrency: 3, // Max parallel AI calls (default: 3)
mode: "preview", // Use placeholders instead of real generation
});A production-level example showing reference-based character consistency across 8 scenes, clip trimming, fade transitions, music with ducking, voiceover, and captions.
import { render, Render, Clip, Image, Video, Speech, Music, Captions } from "vargai/react";
import { createVarg } from "@vargai/gateway";
const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });
// --- REFERENCES ---
const KURT_REF = "https://uu.varg.ai/1773109560573_gbz0buz0.jpg";
// --- STYLE ---
const STYLE = "vintage color photograph, film grain texture, cool blue tones, 1990s grunge aesthetic, moody atmospheric lighting, soft focus, no sepia, no border";
const CHARACTER = "same man from reference, long blonde hair, pale complexion, introspective gaze, worn expression";
const CHARACTER_SPEAKING = `${CHARACTER}, lips slightly parted, speaking`;
// --- SCENE IMAGES (all reference the same photo for character consistency) ---
const img1 = Image({
prompt: { text: `Medium close-up portrait, ${CHARACTER_SPEAKING}, in a basement or garage setting, holding an acoustic guitar, dim lamp light, melancholic expression, ${STYLE}`, images: [KURT_REF] },
model: varg.imageModel("nano-banana-pro/edit"),
aspectRatio: "9:16",
});
const img2 = Image({
prompt: { text: `Close-up portrait, ${CHARACTER_SPEAKING}, in a recording studio, microphone visible, focused intensity, cool studio lighting, raw emotional expression, ${STYLE}`, images: [KURT_REF] },
model: varg.imageModel("nano-banana-pro/edit"),
aspectRatio: "9:16",
});
const img3 = Image({
prompt: { text: `Medium shot, ${CHARACTER}, on stage with guitar, powerful performance stance, stage lights and smoke, wild energy, hair flying, ${STYLE}`, images: [KURT_REF] },
model: varg.imageModel("nano-banana-pro/edit"),
aspectRatio: "9:16",
});
const img4 = Image({
prompt: { text: `Portrait, ${CHARACTER}, in quiet contemplative pose, looking downward, soft diffused light, vulnerable and fragile expression, ${STYLE}`, images: [KURT_REF] },
model: varg.imageModel("nano-banana-pro/edit"),
aspectRatio: "9:16",
});
const img5 = Image({
prompt: { text: `Close-up of man's anguished face, ${CHARACTER}, eyes filled with pain, brow furrowed, emotional turmoil visible, moody dark lighting, ${STYLE}`, images: [KURT_REF] },
model: varg.imageModel("nano-banana-pro/edit"),
aspectRatio: "9:16",
});
const img6 = Image({
prompt: { text: `Close-up hands on guitar strings, ${CHARACTER} playing, fingers on fretboard, worn hands, intimate creative moment, dim candlelit atmosphere, ${STYLE}`, images: [KURT_REF] },
model: varg.imageModel("nano-banana-pro/edit"),
aspectRatio: "9:16",
});
const img7 = Image({
prompt: { text: `Medium shot from below, ${CHARACTER} on stage with guitar raised, intense performance, crowd energy, stage lights, raw power, ${STYLE}`, images: [KURT_REF] },
model: varg.imageModel("nano-banana-pro/edit"),
aspectRatio: "9:16",
});
const img8 = Image({
prompt: { text: `Wide shot, ${CHARACTER} alone in empty room, sitting isolated, looking haunted, pale blue light through window, solitary melancholy, ${STYLE}`, images: [KURT_REF] },
model: varg.imageModel("nano-banana-pro/edit"),
aspectRatio: "9:16",
});
// --- ANIMATE TO VIDEO ---
const vid1 = Video({
prompt: { text: "man holding guitar, fingers moving gently on strings, sad contemplative gaze, quiet intensity", images: [img1] },
model: varg.videoModel("kling-v2.5"),
duration: 10,
});
const vid2 = Video({
prompt: { text: "man in studio, raw emotional expression, voice trembling slightly, intense focus, vulnerability", images: [img2] },
model: varg.videoModel("kling-v2.5"),
duration: 10,
});
const vid3 = Video({
prompt: { text: "man on stage with guitar, explosive energy, body swaying with music, wild passionate performance", images: [img3] },
model: varg.videoModel("kling-v2.5"),
duration: 10,
});
const vid4 = Video({
prompt: { text: "man in quiet pose, head slightly tilted, troubled gaze, gentle defeated movement", images: [img4] },
model: varg.videoModel("kling-v2.5"),
duration: 10,
});
const vid5 = Video({
prompt: { text: "man's jaw clenches, eyes show deep pain, brow furrows, emotional anguish expression", images: [img5] },
model: varg.videoModel("kling-v2.5"),
duration: 5,
});
const vid6 = Video({
prompt: { text: "fingers playing guitar strings, hand movement, strings vibrating, intimate musical moment", images: [img6] },
model: varg.videoModel("kling-v2.5"),
duration: 5,
});
const vid7 = Video({
prompt: { text: "man thrashing on stage with guitar, intense movement, powerful performance, crowd energy", images: [img7] },
model: varg.videoModel("kling-v2.5"),
duration: 5,
});
const vid8 = Video({
prompt: { text: "man sits alone in empty room, slow movement, withdrawn isolated pose, melancholic atmosphere", images: [img8] },
model: varg.videoModel("kling-v2.5"),
duration: 5,
});
// --- VOICEOVER ---
const speech = Speech({
voice: "adam",
model: varg.speechModel("eleven_turbo_v2"),
children: "Kurt Cobain came from nowhere. His pain became a generation's anthem. Raw, authentic, unfiltered. Nirvana changed everything. Though he burned too bright, too fast. His voice echoes forever.",
});
// --- BUILD RENDER ---
await render(
<Render width={1080} height={1920}>
<Music
prompt="dark melancholic grunge rock instrumental, distorted guitar, 1990s alternative rock, heavy sad atmosphere"
model={varg.musicModel("music_v1")}
volume={0.15}
loop={true}
ducking={true}
duration={16}
/>
<Clip duration={2} cutFrom={0.3} cutTo={2.3}>
{vid1}
</Clip>
<Clip duration={2} cutFrom={0.3} cutTo={2.3} transition={{ name: "fade", duration: 0.15 }}>
{vid2}
</Clip>
<Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
{vid5}
</Clip>
<Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
{vid6}
</Clip>
<Clip duration={2} cutFrom={0.3} cutTo={2.3} transition={{ name: "fade", duration: 0.15 }}>
{vid3}
</Clip>
<Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
{vid7}
</Clip>
<Clip duration={2} cutFrom={0.3} cutTo={2.3} transition={{ name: "fade", duration: 0.15 }}>
{vid4}
</Clip>
<Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
{vid8}
</Clip>
<Captions src={speech} style="tiktok" position="bottom" />
</Render>,
{ output: "output/legend-kurt.mp4" },
);Key techniques in this template:
- Reference image consistency: All 8 scene images pass the same
KURT_REFphoto tonano-banana-pro/edit, keeping the character recognizable across different settings cutFrom/cutTo: Trims the first ~0.3s of each generated clip (removes AI "warmup" frames)ducking: Automatically lowers music volume when speech is playingloop: Music loops to fill the full video duration- Non-sequential clip ordering: Clips are arranged for editorial pacing (vid1, vid2, vid5, vid6, vid3, vid7, vid4, vid8), not generation order
| Type | Model | ID | Credits |
|---|---|---|---|
| Video | Kling V3 | kling-v3 |
150 |
| Video (budget) | Kling V3 Standard | kling-v3-standard |
100 |
| Image | Nano Banana Pro | nano-banana-pro |
5 |
| Image editing | Nano Banana Pro Edit | nano-banana-pro/edit |
5 |
| Image (fast) | Flux Schnell | flux-schnell |
5 |
| Speech | Turbo | eleven_turbo_v2 |
20 |
| Music | Music V1 | music_v1 |
30 |
These constraints are critical — wrong values cause 422 errors:
| Model | Duration rule |
|---|---|
kling-v3, kling-v3-standard, kling-v2.6 |
Any integer from 3 to 15 seconds |
kling-v2.5 and older |
ONLY 5 or 10 seconds — any other value fails |
ltx-2-19b-distilled |
Uses num_frames (not duration) and video_size (not aspect_ratio) |
Pass only one image in the Video prompt images array. Multiple images cause errors.
// CORRECT
Video({ prompt: { text: "cat walks", images: [catImage] }, ... })
// WRONG — will error
Video({ prompt: { text: "cat walks", images: [catImage, bgImage] }, ... })The <Clip duration={N}> should match the duration of the Video inside it:
const vid = Video({ ..., duration: 5 });
// CORRECT
<Clip duration={5}>{vid}</Clip>
// WRONG — mismatch causes timing issues
<Clip duration={3}>{vid}</Clip>Always generate audio (speech/music) before video. Audio duration is unpredictable — a voiceover might be 6s or 12s depending on text length and pacing. If you generate video first, you'll either have mismatched lengths or waste credits regenerating.
For voiceover-driven videos (local render):
- Generate the voiceover first
- Check its duration
- Generate video clips to match
import { createVarg } from "@vargai/gateway";
import { generateVideo } from "vargai/ai";
import { render, Render, Clip, Image, Video, Speech, Captions } from "vargai/react";
const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });
// 1. Generate voiceover first
const voiceover = Speech({
voice: "rachel",
model: varg.speechModel("eleven_turbo_v2"),
children: "Welcome to our product showcase. Here are three features you'll love.",
});
// 2. Render with `shortest` — video ends when voiceover ends
await render(
<Render width={1080} height={1920} shortest>
<Clip duration={15}>
<Video
prompt={{ text: "product showcase, smooth camera orbit", images: [productImage] }}
model={varg.videoModel("kling-v3")}
duration={15}
/>
</Clip>
<Captions src={voiceover} style="tiktok" />
</Render>,
{ output: "output/showcase.mp4" },
);
// If voiceover is 8s, output is 8s (not 15s of mostly silent video)For music: Always set duration on <Music> to match the total video length. Without it, ElevenLabs generates ~60s of audio which extends the video far beyond intended length. If the music is longer than the video, use shortest as a safety net:
// Video is 3 clips × 5s = 15s total
<Render width={1080} height={1920} shortest>
<Music prompt="ambient electronic" volume={0.2} duration={15} />
<Clip duration={5}>{vid1}</Clip>
<Clip duration={5}>{vid2}</Clip>
<Clip duration={5}>{vid3}</Clip>
</Render>For gateway-only workflows (no local render): Generate speech first via VargClient, then use the audio duration to decide how many video clips to generate:
// 1. Generate speech first
const speechJob = await client.createSpeech({
model: "eleven_turbo_v2",
text: "A narration script that could be any length.",
voice: "rachel",
});
const speech = await client.waitForJob(speechJob.job_id);
// speech.output.url → use this to determine pacing
// 2. Now generate video clips knowing the audio length
const videoJob = await client.createVideo({
model: "kling-v3",
prompt: "matching scene for the narration",
duration: 10, // match to speech length
});| Model | Prompt format |
|---|---|
nano-banana-pro |
Plain string: prompt: "a sunset over the ocean" |
nano-banana-pro/edit |
Object with images: prompt: { text: "...", images: [refUrl] } |
flux-schnell, flux-pro, flux-dev |
Plain string |
soul (Higgsfield) |
Plain string |
Media elements (Image, Video, Speech, Music) must be defined as variables using function calls. JSX assignment does not work for these:
// CORRECT — function call
const img = Image({ prompt: "a cat", model: varg.imageModel("nano-banana-pro") });
const vid = Video({ prompt: { text: "cat moves", images: [img] }, model: varg.videoModel("kling-v3") });
// WRONG — JSX assignment
const img = <Image prompt="a cat" />;Then use the variables as children or props in JSX:
<Clip duration={5}>{vid}</Clip>When a character or product appears across multiple scenes, use this 3-step workflow:
- Reference image — generate (or receive) a character hero shot
- Scene images via
/edit— usenano-banana-pro/editto place the character into each scene, always passing the reference viaimages: [ref] - Animate via i2v — pass each scene image to
Video()for image-to-video
Never generate scene images from scratch — always edit from the reference. Without this pattern, each clip generates a different-looking character.
import { createVarg } from "@vargai/gateway";
import { render, Render, Clip, Image, Video } from "vargai/react";
const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });
// 1. Character reference
const ref = Image({
prompt: "a man in a dark suit, dramatic side lighting, neutral background",
model: varg.imageModel("nano-banana-pro"),
aspectRatio: "9:16",
});
// 2. Scene images — place character into different environments
const scene1 = Image({
prompt: { text: "same man sitting at a wooden desk, harsh lamp light, dark study", images: [ref] },
model: varg.imageModel("nano-banana-pro/edit"),
aspectRatio: "9:16",
});
const scene2 = Image({
prompt: { text: "same man standing by a tall window, cold grey daylight on face", images: [ref] },
model: varg.imageModel("nano-banana-pro/edit"),
aspectRatio: "9:16",
});
const scene3 = Image({
prompt: { text: "same man walking alone down a narrow cobblestone alley at night", images: [ref] },
model: varg.imageModel("nano-banana-pro/edit"),
aspectRatio: "9:16",
});
// 3. Animate each scene image
const vid1 = Video({ prompt: { text: "man looks up from desk, slight head turn", images: [scene1] }, model: varg.videoModel("kling-v3"), duration: 5 });
const vid2 = Video({ prompt: { text: "man turns away from window, eyes cast down", images: [scene2] }, model: varg.videoModel("kling-v3"), duration: 5 });
const vid3 = Video({ prompt: { text: "man walks slowly forward, coat swaying", images: [scene3] }, model: varg.videoModel("kling-v3"), duration: 5 });
await render(
<Render width={1080} height={1920}>
<Clip duration={5}>{vid1}</Clip>
<Clip duration={5} transition={{ name: "fade", duration: 0.3 }}>{vid2}</Clip>
<Clip duration={5} transition={{ name: "fade", duration: 0.3 }}>{vid3}</Clip>
</Render>,
{ output: "output/multi-scene.mp4" },
);Generated video clips often have ~0.3s of "warm-up" at the start where the image is mostly static. Use cutFrom on <Clip> to trim it:
// Trim first 0.3s of generated video for snappier cuts
<Clip duration={2} cutFrom={0.3} cutTo={2.3}>{vid}</Clip>The source video duration must be >= cutTo.
When modifying part of a render, keep unchanged prompts exactly the same (same text, same model, same parameters). This ensures unchanged assets hit the cache and are not re-generated — saving time and credits. Only change the prompts for the parts you're actually modifying.
To sync a video to speech audio, pass both video and audio in the Video prompt:
const voiceover = Speech({
voice: "rachel",
model: varg.speechModel("eleven_turbo_v2"),
children: "Hello, welcome to our product demo.",
});
const character = Video({
prompt: { text: "person speaking, subtle movements", images: [characterImage] },
model: varg.videoModel("kling-v3"),
duration: 5,
});
const lipsynced = Video({
prompt: { video: character, audio: voiceover },
model: varg.videoModel("sync-v2-pro"),
});| Model | ID | Best for |
|---|---|---|
| Kling V3 | kling-v3 |
Highest quality |
| Kling V3 Standard | kling-v3-standard |
Quality/cost balance |
| Kling V2.6 | kling-v2.6 |
High quality + native audio |
| Kling V2.5 | kling-v2.5 |
Reliable general purpose |
| Wan 2.5 | wan-2.5 |
Characters, anime |
| Minimax | minimax |
Alternative |
| LTX 2 | ltx-2-19b-distilled |
Fast, with audio |
| Sync V2 Pro | sync-v2-pro |
Lipsync |
| Lipsync | lipsync |
Lipsync (budget) |
| Model | ID | Best for |
|---|---|---|
| Flux Schnell | flux-schnell |
Fast, cheap |
| Flux Pro | flux-pro |
High quality |
| Flux Dev | flux-dev |
Development |
| Nano Banana Pro | nano-banana-pro |
Versatile |
| Nano Banana Edit | nano-banana-pro/edit |
Image editing, character consistency |
| Recraft V3 | recraft-v3 |
Design, illustration |
| Soul | soul |
Character consistency (Higgsfield) |
| Model | ID |
|---|---|
| Multilingual V2 | eleven_multilingual_v2 |
| Turbo V2 | eleven_turbo_v2 |
| Flash V2.5 | eleven_flash_v2_5 |
| V3 | eleven_v3 |
| Model | ID |
|---|---|
| Music V1 | music_v1 |
Same parameters = instant cached result at zero cost. Cache is keyed on model + prompt + files + options. Cache TTL is 30 days. All outputs are persisted to s3.varg.ai.
- 1 credit = 1 cent. Signup gives 1,000 credits ($10).
- Cache hits are free.
- Credits are deducted after successful generation.
- Cost examples:
nano-banana-pro= 5 credits,kling-v3= 150 credits,kling-v3-standard= 100 credits, speech = 20-25 credits, music = 30 credits.
240 requests/minute per API key.
{
"error": { "_tag": "ValidationError", "message": "prompt is required", "statusCode": 400 }
}| Status | Tag | Meaning |
|---|---|---|
| 400 | ValidationError |
Bad request |
| 401 | AuthError |
Invalid API key |
| 402 | InsufficientBalanceError |
Out of credits |
| 429 | RateLimitError |
Too many requests |
| 502 | ProviderError |
Upstream AI provider failed |
The VargClient throws VargGatewayError with statusCode, message, field, and provider properties.
const img = await client.createImage({
model: "nano-banana-pro",
prompt: "product photo of white sneakers on marble surface",
aspect_ratio: "9:16",
});
const imgResult = await client.waitForJob(img.job_id);
const vid = await client.createVideo({
model: "kling-v3",
prompt: "camera slowly orbits around the sneakers, cinematic lighting",
files: [{ url: imgResult.output!.url }],
});
const vidResult = await client.waitForJob(vid.job_id);
console.log(vidResult.output!.url);// Generate speech and video in parallel
const [speechJob, videoJob] = await Promise.all([
client.createSpeech({
model: "eleven_multilingual_v2",
text: "Today we're looking at three key trends in AI.",
voice: "sarah",
}),
client.createVideo({
model: "kling-v3",
prompt: "woman speaking to camera, subtle gestures, professional setting",
duration: 5,
aspect_ratio: "9:16",
}),
]);
const [speech, video] = await Promise.all([
client.waitForJob(speechJob.job_id),
client.waitForJob(videoJob.job_id),
]);
console.log("Video:", video.output!.url);
console.log("Audio:", speech.output!.url);import { generateVideo, generateImage } from "vargai/ai";
const { image } = await generateImage({
model: varg.imageModel("nano-banana-pro"),
prompt: "anime warrior girl, red hair, silver armor",
aspectRatio: "9:16",
});
await Bun.write("output/character.png", image.uint8Array);
const { video } = await generateVideo({
model: varg.videoModel("kling-v3"),
prompt: "warrior draws sword dramatically",
aspectRatio: "9:16",
});
await Bun.write("output/scene.mp4", video.uint8Array);const prompts = [
"sunset over ocean, golden hour",
"mountain peaks at dawn, misty",
"city skyline at night, neon lights",
];
const jobs = await Promise.all(
prompts.map((prompt) =>
client.createImage({ model: "nano-banana-pro", prompt, aspect_ratio: "16:9" }),
),
);
const results = await Promise.all(
jobs.map((job) => client.waitForJob(job.job_id)),
);
results.forEach((r, i) => console.log(`${prompts[i]}: ${r.output!.url}`));| Method | Parameters | Returns |
|---|---|---|
createVideo(params) |
{ model, prompt, duration?, aspect_ratio?, files?, provider_options? } |
Promise<JobResponse> |
createImage(params) |
{ model, prompt, aspect_ratio?, files?, provider_options? } |
Promise<JobResponse> |
createSpeech(params) |
{ model, text, voice?, provider_options? } |
Promise<JobResponse> |
createMusic(params) |
{ model, prompt, duration?, provider_options? } |
Promise<JobResponse> |
uploadFile(blob, mediaType) |
Blob | Buffer, string |
Promise<FileUploadResponse> |
getJob(id) |
string |
Promise<JobResponse> |
waitForJob(id, opts?) |
string, { pollIntervalMs?, maxAttempts? } |
Promise<JobResponse> |
cancelJob(id) |
string |
Promise<void> |
listVoices() |
— | Promise<VoiceListResponse> |
{
job_id: string;
status: "queued" | "processing" | "completed" | "failed" | "cancelled";
model: string;
created_at: string;
completed_at?: string;
output?: { url: string; media_type: string };
cache?: { hit: boolean; key: string };
error?: string;
}| Method | Returns |
|---|---|
varg.videoModel(id) |
VideoModelV3 |
varg.imageModel(id) |
ImageModelV3 |
varg.speechModel(id) |
SpeechModelV3 |
varg.musicModel(id) |
MusicModelV3 |