Java LLM Provider Tracing: Vertex, Azure, Ollama, Watsonx

Trace Google GenAI, Vertex AI, Azure OpenAI, Ollama, and Watsonx in Java with Future AGI. All providers use the same TracedClient wrapper pattern.

📝

TL;DR

Five LLM providers that follow the standard Traced<X>(client) pattern
Google GenAI and Vertex AI have countTokens() and chat session support
Azure OpenAI traces chat completions, embeddings, and legacy completions
Ollama wraps ollama4j, Watsonx uses reflection like Anthropic

Prerequisites

Complete the Java SDK setup first. All providers below need traceai-java-core and TraceAI.init() called before use.

Google GenAI

Wraps the com.google.genai.Client for Google’s Gemini API.

<dependency>
    <groupId>com.github.future-agi.traceAI</groupId>
    <artifactId>traceai-java-google-genai</artifactId>
    <version>main-SNAPSHOT</version>
</dependency>

implementation 'com.github.future-agi.traceAI:traceai-java-google-genai:main-SNAPSHOT'

import ai.traceai.TraceAI;
import ai.traceai.googlegenai.TracedGenerativeModel;
import com.google.genai.Client;

TraceAI.initFromEnvironment();

Client client = Client.builder()
    .apiKey(System.getenv("GOOGLE_API_KEY"))
    .build();

// Note: model name is a constructor parameter
TracedGenerativeModel model = new TracedGenerativeModel(client, "gemini-2.0-flash");

// Simple generation
var response = model.generateContent("What is the capital of France?");
System.out.println(response.text());

// Multi-turn chat
var chat = model.startChat();
var reply = chat.sendMessage("Hello!");
System.out.println(reply.text());

// Token counting
var tokenCount = model.countTokens("How many tokens is this?");

Spans created:

generateContent() - “Google GenAI Generate Content” (LLM)
chat.sendMessage() - “Google GenAI Chat Message” (LLM)
countTokens() - “Google GenAI Count Tokens” (LLM)

Vertex AI

Wraps com.google.cloud.vertexai.generativeai.GenerativeModel for Google Cloud’s Vertex AI.

<dependency>
    <groupId>com.github.future-agi.traceAI</groupId>
    <artifactId>traceai-java-vertexai</artifactId>
    <version>main-SNAPSHOT</version>
</dependency>

implementation 'com.github.future-agi.traceAI:traceai-java-vertexai:main-SNAPSHOT'

import ai.traceai.TraceAI;
import ai.traceai.vertexai.TracedGenerativeModel;
import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.generativeai.GenerativeModel;

TraceAI.initFromEnvironment();

VertexAI vertexAI = new VertexAI("your-project-id", "us-central1");
GenerativeModel nativeModel = new GenerativeModel("gemini-2.0-flash", vertexAI);

TracedGenerativeModel model = new TracedGenerativeModel(nativeModel);

var response = model.generateContent("What is the capital of France?");
System.out.println(response.getCandidatesList().get(0).getContent().getParts(0).getText());

Spans created:

generateContent() - “Vertex AI Generate Content” (LLM)
countTokens() - “Vertex AI Count Tokens” (LLM)

Note: Vertex AI streaming (generateContentStream) creates a span but ends it before the stream is consumed. Use non-streaming for accurate trace data.

Azure OpenAI

Wraps com.azure.ai.openai.OpenAIClient from the Azure SDK.

<dependency>
    <groupId>com.github.future-agi.traceAI</groupId>
    <artifactId>traceai-java-azure-openai</artifactId>
    <version>main-SNAPSHOT</version>
</dependency>

implementation 'com.github.future-agi.traceAI:traceai-java-azure-openai:main-SNAPSHOT'

import ai.traceai.TraceAI;
import ai.traceai.azure.openai.TracedAzureOpenAIClient;
import com.azure.ai.openai.OpenAIClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.azure.core.credential.AzureKeyCredential;

TraceAI.initFromEnvironment();

OpenAIClient client = new OpenAIClientBuilder()
    .endpoint(System.getenv("AZURE_OPENAI_ENDPOINT"))
    .credential(new AzureKeyCredential(System.getenv("AZURE_OPENAI_API_KEY")))
    .buildClient();

TracedAzureOpenAIClient traced = new TracedAzureOpenAIClient(client);

// Chat completions - first arg is deployment name
var chatOptions = new ChatCompletionsOptions(List.of(
    new ChatRequestUserMessage("What is the capital of France?")
));
var response = traced.getChatCompletions("gpt-4o-mini", chatOptions);
System.out.println(response.getChoices().get(0).getMessage().getContent());

// Embeddings
var embeddingOptions = new EmbeddingsOptions(List.of("Hello world"));
var embeddings = traced.getEmbeddings("text-embedding-3-small", embeddingOptions);

Spans created:

getChatCompletions() - “Azure OpenAI Chat Completion” (LLM)
getEmbeddings() - “Azure OpenAI Embedding” (EMBEDDING)
getCompletions() - “Azure OpenAI Completion” (LLM, legacy API)

Azure OpenAI captures tool call attributes when the model invokes tools, and handles all message types (System, User, Assistant, Tool, Function).

Ollama

Wraps io.github.ollama4j.OllamaAPI for local Ollama models.

<dependency>
    <groupId>com.github.future-agi.traceAI</groupId>
    <artifactId>traceai-java-ollama</artifactId>
    <version>main-SNAPSHOT</version>
</dependency>

implementation 'com.github.future-agi.traceAI:traceai-java-ollama:main-SNAPSHOT'

import ai.traceai.TraceAI;
import ai.traceai.ollama.TracedOllamaAPI;
import io.github.ollama4j.OllamaAPI;

TraceAI.initFromEnvironment();

OllamaAPI api = new OllamaAPI("http://localhost:11434");
TracedOllamaAPI traced = new TracedOllamaAPI(api);

// Generate
var result = traced.generate("llama3", "What is the capital of France?");
System.out.println(result.getResponse());

// Chat
var chatResult = traced.chat("llama3", List.of(
    new OllamaChatMessage("user", "Hello!")
));

// Embeddings
var embedding = traced.embed("llama3", "Hello world");

// List models
var models = traced.listModels();

Spans created:

generate() - “Ollama Generate” (LLM)
chat() - “Ollama Chat” (LLM)
embed() - “Ollama Embed” (EMBEDDING)
listModels() - “Ollama List Models” (LLM)

Ollama spans include ollama.response_time_ms from the Ollama server’s own timing.

IBM Watsonx

Wraps the Watsonx Java SDK using reflection (like Anthropic) for cross-version compatibility.

<dependency>
    <groupId>com.github.future-agi.traceAI</groupId>
    <artifactId>traceai-java-watsonx</artifactId>
    <version>main-SNAPSHOT</version>
</dependency>

implementation 'com.github.future-agi.traceAI:traceai-java-watsonx:main-SNAPSHOT'

import ai.traceai.TraceAI;
import ai.traceai.watsonx.TracedWatsonxAI;

TraceAI.initFromEnvironment();

// Create Watsonx client (your SDK version)
Object watsonxClient = /* your Watsonx client */;

// Wraps as Object - reflection-based, version-agnostic
TracedWatsonxAI traced = new TracedWatsonxAI(watsonxClient);

// Text generation
Object response = traced.generateText(textGenRequest);

// Chat
Object chatResponse = traced.chat(chatRequest);

// Embeddings
Object embedResponse = traced.embedText(embedRequest);

Spans created:

generateText() - “Watsonx Text Generation” (LLM)
chat() - “Watsonx Chat” (LLM)
embedText() - “Watsonx Embed” (EMBEDDING)

Watsonx spans include watsonx.project_id, watsonx.space_id, and watsonx.stop_reason.

Like Anthropic, the reflection approach means the client and request objects are typed as Object. Cast the return values to your SDK’s response types.

Common span attributes

All providers above capture these core attributes:

Attribute	Description
`llm.provider`	Provider name (`google`, `azure-openai`, `ollama`, `watsonx`)
`llm.request.model`	Model name from the request
`llm.response.model`	Model name from the response (if different)
`llm.token_count.prompt`	Input token count
`llm.token_count.completion`	Output token count
`llm.token_count.total`	Total token count
`input.value` / `output.value`	Plain text input/output
`fi.raw_input` / `fi.raw_output`	Full request/response as JSON

Was this page helpful?

Questions & Discussion