Multi-Provider Setup — FluxGate Docs

Overview

Many production apps route requests to different providers based on task type, cost, or latency. FluxGate tracks all three under the same organization, so your Cost Breakdown by Feature view shows the true blended cost regardless of which provider served each request.

Shared FluxGate client

All three wrappers share one FluxGate instance. This ensures they all post events to the same organization.

// lib/fluxgate.ts
import { FluxGate } from "@fluxgate/sdk";

export const fg = new FluxGate({
  apiKey: process.env.FLUXGATE_API_KEY!,
});

Provider clients

// lib/openai.ts
import OpenAI from "openai";
import { createOpenAICostTracker } from "@fluxgate/openai";
import { fg } from "./fluxgate";

export const openai = createOpenAICostTracker(
  new OpenAI({ apiKey: process.env.OPENAI_API_KEY! }),
  fg,
);

// lib/anthropic.ts
import Anthropic from "@anthropic-ai/sdk";
import { createAnthropicCostTracker } from "@fluxgate/anthropic";
import { fg } from "./fluxgate";

export const anthropic = createAnthropicCostTracker(
  new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! }),
  fg,
);

// lib/gemini.ts
import { GoogleGenerativeAI } from "@google/generative-ai";
import { createGeminiCostTracker } from "@fluxgate/gemini";
import { fg } from "./fluxgate";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);

export const geminiFlash = createGeminiCostTracker(
  genAI.getGenerativeModel({ model: "gemini-1.5-flash" }),
  fg,
);

export const geminiPro = createGeminiCostTracker(
  genAI.getGenerativeModel({ model: "gemini-1.5-pro" }),
  fg,
);

Feature-based provider routing

Route each product feature to the most cost-effective or capable provider.

// actions/ai/generate.ts
"use server";

import { auth } from "@/lib/auth";
import { openai } from "@/lib/openai";
import { anthropic } from "@/lib/anthropic";
import { geminiFlash } from "@/lib/gemini";

type Task = "summarize" | "code-review" | "chat" | "translation";

export async function generateAction(task: Task, input: string) {
  const session = await auth();
  if (!session?.user) throw new Error("Unauthenticated");

  const userId = session.user.id;

  switch (task) {
    case "summarize":
      // Gemini Flash: fast and cheap for summarization
      return geminiFlash
        .withContext({ feature: "summarizer", user: userId })
        .generateContent(`Summarize in 3 sentences:\n\n${input}`)
        .then((r) => r.response.text());

    case "code-review":
      // Claude: best-in-class for nuanced code reasoning
      return anthropic
        .withContext({ feature: "code-review", user: userId })
        .messages.create({
          model: "claude-sonnet-4-6",
          max_tokens: 2048,
          messages: [
            { role: "user", content: `Review this code:\n\n${input}` },
          ],
        })
        .then((m) => (m.content[0].type === "text" ? m.content[0].text : ""));

    case "chat":
      // GPT-4o: low latency, good for interactive chat
      return openai
        .withContext({ feature: "chat", user: userId })
        .chat.completions.create({
          model: "gpt-4o",
          messages: [{ role: "user", content: input }],
        })
        .then((c) => c.choices[0].message.content ?? "");

    case "translation":
      // GPT-4o-mini: cost-efficient for structured tasks
      return openai
        .withContext({ feature: "translation", user: userId })
        .chat.completions.create({
          model: "gpt-4o-mini",
          messages: [
            {
              role: "system",
              content:
                "Translate the following text to English. Return only the translation.",
            },
            { role: "user", content: input },
          ],
        })
        .then((c) => c.choices[0].message.content ?? "");
  }
}

Fallback pattern

Fall back to a cheaper provider when the primary one is unavailable.

// lib/ai/with-fallback.ts
import { openai } from "@/lib/openai";
import { anthropic } from "@/lib/anthropic";

export async function chatWithFallback(
  messages: { role: "user" | "assistant"; content: string }[],
  userId: string,
) {
  try {
    // Primary: GPT-4o
    const completion = await openai
      .withContext({ feature: "chat-with-fallback", user: userId })
      .chat.completions.create({ model: "gpt-4o", messages });

    return completion.choices[0].message.content ?? "";
  } catch (primaryErr) {
    console.warn("GPT-4o failed, falling back to Claude:", primaryErr);

    // Fallback: Claude Sonnet — event tracked separately in FluxGate
    const message = await anthropic
      .withContext({
        feature: "chat-with-fallback-claude",
        user: userId,
      })
      .messages.create({
        model: "claude-sonnet-4-6",
        max_tokens: 2048,
        messages,
      });

    return message.content[0].type === "text" ? message.content[0].text : "";
  }
}

Both attempts — the failed primary and the successful fallback — appear as separate events in FluxGate, so you can see the true cost and frequency of fallbacks over time.

A/B testing providers

Compare two providers on the same prompt. Both events are tracked, giving you real-world latency and cost data to inform your routing decision.

// lib/ai/ab-test.ts
import { openai } from "@/lib/openai";
import { anthropic } from "@/lib/anthropic";

export async function abTestProviders(prompt: string, userId: string) {
  const [gptResult, claudeResult] = await Promise.allSettled([
    openai
      .withContext({ feature: "ab-test-gpt", user: userId })
      .chat.completions.create({
        model: "gpt-4o",
        messages: [{ role: "user", content: prompt }],
      }),
    anthropic
      .withContext({ feature: "ab-test-claude", user: userId })
      .messages.create({
        model: "claude-sonnet-4-6",
        max_tokens: 1024,
        messages: [{ role: "user", content: prompt }],
      }),
  ]);

  const gptText =
    gptResult.status === "fulfilled"
      ? (gptResult.value.choices[0].message.content ?? "")
      : null;

  const claudeText =
    claudeResult.status === "fulfilled"
      ? claudeResult.value.content[0].type === "text"
        ? claudeResult.value.content[0].text
        : null
      : null;

  return { gptText, claudeText };
}

After a few thousand calls, open Cost Breakdown → By Feature and filter to ab-test-gpt and ab-test-claude to compare cost, latency p95, and error rate side-by-side.

Required environment variables

# .env.local
FLUXGATE_API_KEY=fg_live_...

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AIza...