Install
npm install @fluxgate/sdk @fluxgate/gemini @google/genai
→ @fluxgate/gemini · @fluxgate/sdk on npm
@google/genai (≥ 2.0.0) is a peer dependency — install it alongside the wrapper. @fluxgate/sdk is pulled in automatically as a dependency.
One-time setup
Create a single tracked client at module level. The model is specified per-call, not at tracker creation.
// lib/gemini.ts
import { GoogleGenAI } from "@google/genai";
import { FluxGate } from "@fluxgate/sdk";
import { createGeminiCostTracker } from "@fluxgate/gemini";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });
const fg = new FluxGate({ apiKey: process.env.FLUXGATE_API_KEY! });
export const gemini = createGeminiCostTracker(ai, fg);
Do not construct a new
GoogleGenAIorFluxGateinstance per request. Module-level singletons prevent unnecessary connection overhead.
Text generation
Access generation through .models.generateContent(). The model is passed in the per-call payload.
import { gemini } from "@/lib/gemini";
const result = await gemini
.withContext({
feature: "content-generation",
user: { id: session.user.id, email: session.user.email },
sessionId: session.id,
})
.models.generateContent({
model: "gemini-2.5-flash",
contents: prompt,
});
const text = result.text;
const { cost, trackingId } = result.fluxGateCostTrackingResponse;
Streaming
const result = await gemini
.withContext({ feature: "streaming-gen" })
.models.generateContentStream({
model: "gemini-2.5-flash",
contents: longPrompt,
});
for await (const chunk of result.stream) {
process.stdout.write(chunk.text());
}
// Tracking is finalised once the stream is consumed
console.log(result.fluxGateCostTrackingResponse);
Multi-turn chat sessions
chats.create() returns a TrackedChat. Each sendMessage and sendMessageStream call is tracked individually and attributed to the same context.
const chat = await gemini
.withContext({ feature: "chatbot", user: { id: session.user.id } })
.chats.create({ model: "gemini-2.5-flash" });
const result1 = await chat.sendMessage("What is FluxGate?");
console.log(result1.text);
const result2 = await chat.sendMessage("How do I install it?");
console.log(result2.text);
Mid-conversation context upgrade
Use .withTracking() on a TrackedChat to merge additional context — useful when a user authenticates mid-session or moves to a paid tier. withTracking() returns a new TrackedChat sharing the same underlying session history; the original chat object is unaffected.
const premiumChat = chat.withTracking({
feature: "premium-chatbot",
user: {
id: currentUser.id,
monthlyRevenue: currentUser.mrr,
},
});
const result = await premiumChat.sendMessage("I need a detailed analysis");
New context keys override matching keys from the original context; unmatched keys are preserved.
Multimodal (vision)
import fs from "fs";
const imageBytes = fs.readFileSync("./screenshot.jpg").toString("base64");
const result = await gemini
.withContext({ feature: "image-analysis" })
.models.generateContent({
model: "gemini-2.5-flash",
contents: [
{
role: "user",
parts: [
{ text: "Describe the UI and identify any accessibility issues." },
{ inlineData: { mimeType: "image/jpeg", data: imageBytes } },
],
},
],
});
Thinking models
Gemini 2.5 models support extended thinking. FluxGate captures reasoning tokens separately for accurate cost attribution.
const result = await gemini
.withContext({ feature: "reasoning-agent" })
.models.generateContent({
model: "gemini-2.5-pro",
contents: complexQuery,
config: { thinkingConfig: { thinkingBudget: 8000 } },
});
const { cost, trackingId } = result.fluxGateCostTrackingResponse;
// cost reflects input + output + thinking tokens
Embeddings
const result = await gemini
.withContext({ feature: "vector-search" })
.models.embedContent({
model: "text-embedding-004",
contents: document,
});
const vector = result.embeddings[0].values;
Safety settings
Safety-blocked responses are automatically recorded with status: "BLOCKED".
import { HarmCategory, HarmBlockThreshold } from "@google/genai";
const result = await gemini
.withContext({ feature: "moderated-chat" })
.models.generateContent({
model: "gemini-2.5-flash",
contents: userMessage,
config: {
safetySettings: [
{
category: HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
],
},
});
FluxGateCostTrackingResponse shape
interface FluxGateCostTrackingResponse {
status:
| "SUCCESS"
| "ERROR"
| "BLOCKED"
| "MAX_TOKENS"
| "CONTENT_FILTER"
| "RECITATION"
| "MALFORMED_REQUEST";
cost: number | null; // USD
trackingId: string | null;
createdAt: number | null; // Unix timestamp in milliseconds
errorMessage?: string;
}
Tracked automatically: input tokens, output tokens, thinking tokens (Gemini 2.5 models), cache read tokens, model name, latency (ms), stream duration, and finish reason (stop, max_tokens, safety, recitation).
FluxGateContext fields
All .withContext() calls accept the following fields:
{
feature?: string // e.g., "chatbot", "content-gen"
user?: string | UserSession // End-user ID or rich object
step?: string // Step within a multi-step pipeline
sessionId?: string
conversationId?: string
costOverride?: GeminiCostOverride // Custom per-token rates (see below)
metadata?: Record<string, unknown> // Arbitrary custom data
}
GeminiCostOverride
Supply custom rates when FluxGate does not have pricing for a model. All rates are per 1 million tokens.
await gemini
.withContext({
feature: "fine-tuned-gen",
costOverride: {
inputCostPer1MTokens: 1.25,
outputCostPer1MTokens: 5.0,
thinkingCostPer1MTokens: 3.5, // Gemini 2.5 thinking tokens
},
})
.models.generateContent({ model: "gemini-2.5-pro", contents: prompt });
Supported methods
| Method | Non-streaming | Streaming |
|---|---|---|
models.generateContent | ✅ | — |
models.generateContentStream | — | ✅ |
chats.create → sendMessage | ✅ | — |
chats.create → sendMessageStream | — | ✅ |
models.embedContent | ✅ | — |
Supported models
gemini-2.5-pro, gemini-2.5-flash, gemini-1.5-pro, gemini-1.5-flash, text-embedding-004, and any model published in @google/genai.