We're in development! Things may crash or break

Integrations

Gemini Integration

Track Google Gemini API calls — text generation, streaming, chat sessions, and embeddings — automatically with @fluxgate/gemini.

Install

npm install @fluxgate/sdk @fluxgate/gemini @google/genai

@fluxgate/gemini · @fluxgate/sdk on npm

@google/genai (≥ 2.0.0) is a peer dependency — install it alongside the wrapper. @fluxgate/sdk is pulled in automatically as a dependency.

One-time setup

Create a single tracked client at module level. The model is specified per-call, not at tracker creation.

// lib/gemini.ts
import { GoogleGenAI } from "@google/genai";
import { FluxGate } from "@fluxgate/sdk";
import { createGeminiCostTracker } from "@fluxgate/gemini";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });
const fg = new FluxGate({ apiKey: process.env.FLUXGATE_API_KEY! });

export const gemini = createGeminiCostTracker(ai, fg);

Do not construct a new GoogleGenAI or FluxGate instance per request. Module-level singletons prevent unnecessary connection overhead.

Text generation

Access generation through .models.generateContent(). The model is passed in the per-call payload.

import { gemini } from "@/lib/gemini";

const result = await gemini
  .withContext({
    feature: "content-generation",
    user: { id: session.user.id, email: session.user.email },
    sessionId: session.id,
  })
  .models.generateContent({
    model: "gemini-2.5-flash",
    contents: prompt,
  });

const text = result.text;
const { cost, trackingId } = result.fluxGateCostTrackingResponse;

Streaming

const result = await gemini
  .withContext({ feature: "streaming-gen" })
  .models.generateContentStream({
    model: "gemini-2.5-flash",
    contents: longPrompt,
  });

for await (const chunk of result.stream) {
  process.stdout.write(chunk.text());
}

// Tracking is finalised once the stream is consumed
console.log(result.fluxGateCostTrackingResponse);

Multi-turn chat sessions

chats.create() returns a TrackedChat. Each sendMessage and sendMessageStream call is tracked individually and attributed to the same context.

const chat = await gemini
  .withContext({ feature: "chatbot", user: { id: session.user.id } })
  .chats.create({ model: "gemini-2.5-flash" });

const result1 = await chat.sendMessage("What is FluxGate?");
console.log(result1.text);

const result2 = await chat.sendMessage("How do I install it?");
console.log(result2.text);

Mid-conversation context upgrade

Use .withTracking() on a TrackedChat to merge additional context — useful when a user authenticates mid-session or moves to a paid tier. withTracking() returns a new TrackedChat sharing the same underlying session history; the original chat object is unaffected.

const premiumChat = chat.withTracking({
  feature: "premium-chatbot",
  user: {
    id: currentUser.id,
    monthlyRevenue: currentUser.mrr,
  },
});

const result = await premiumChat.sendMessage("I need a detailed analysis");

New context keys override matching keys from the original context; unmatched keys are preserved.

Multimodal (vision)

import fs from "fs";

const imageBytes = fs.readFileSync("./screenshot.jpg").toString("base64");

const result = await gemini
  .withContext({ feature: "image-analysis" })
  .models.generateContent({
    model: "gemini-2.5-flash",
    contents: [
      {
        role: "user",
        parts: [
          { text: "Describe the UI and identify any accessibility issues." },
          { inlineData: { mimeType: "image/jpeg", data: imageBytes } },
        ],
      },
    ],
  });

Thinking models

Gemini 2.5 models support extended thinking. FluxGate captures reasoning tokens separately for accurate cost attribution.

const result = await gemini
  .withContext({ feature: "reasoning-agent" })
  .models.generateContent({
    model: "gemini-2.5-pro",
    contents: complexQuery,
    config: { thinkingConfig: { thinkingBudget: 8000 } },
  });

const { cost, trackingId } = result.fluxGateCostTrackingResponse;
// cost reflects input + output + thinking tokens

Embeddings

const result = await gemini
  .withContext({ feature: "vector-search" })
  .models.embedContent({
    model: "text-embedding-004",
    contents: document,
  });

const vector = result.embeddings[0].values;

Safety settings

Safety-blocked responses are automatically recorded with status: "BLOCKED".

import { HarmCategory, HarmBlockThreshold } from "@google/genai";

const result = await gemini
  .withContext({ feature: "moderated-chat" })
  .models.generateContent({
    model: "gemini-2.5-flash",
    contents: userMessage,
    config: {
      safetySettings: [
        {
          category: HarmCategory.HARM_CATEGORY_HARASSMENT,
          threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
        },
      ],
    },
  });

FluxGateCostTrackingResponse shape

interface FluxGateCostTrackingResponse {
  status:
    | "SUCCESS"
    | "ERROR"
    | "BLOCKED"
    | "MAX_TOKENS"
    | "CONTENT_FILTER"
    | "RECITATION"
    | "MALFORMED_REQUEST";
  cost: number | null; // USD
  trackingId: string | null;
  createdAt: number | null; // Unix timestamp in milliseconds
  errorMessage?: string;
}

Tracked automatically: input tokens, output tokens, thinking tokens (Gemini 2.5 models), cache read tokens, model name, latency (ms), stream duration, and finish reason (stop, max_tokens, safety, recitation).

FluxGateContext fields

All .withContext() calls accept the following fields:

{
  feature?: string                    // e.g., "chatbot", "content-gen"
  user?: string | UserSession         // End-user ID or rich object
  step?: string                       // Step within a multi-step pipeline
  sessionId?: string
  conversationId?: string
  costOverride?: GeminiCostOverride   // Custom per-token rates (see below)
  metadata?: Record<string, unknown>  // Arbitrary custom data
}

GeminiCostOverride

Supply custom rates when FluxGate does not have pricing for a model. All rates are per 1 million tokens.

await gemini
  .withContext({
    feature: "fine-tuned-gen",
    costOverride: {
      inputCostPer1MTokens: 1.25,
      outputCostPer1MTokens: 5.0,
      thinkingCostPer1MTokens: 3.5, // Gemini 2.5 thinking tokens
    },
  })
  .models.generateContent({ model: "gemini-2.5-pro", contents: prompt });

Supported methods

MethodNon-streamingStreaming
models.generateContent
models.generateContentStream
chats.createsendMessage
chats.createsendMessageStream
models.embedContent

Supported models

gemini-2.5-pro, gemini-2.5-flash, gemini-1.5-pro, gemini-1.5-flash, text-embedding-004, and any model published in @google/genai.