LLM Providers (Java)

Trace Google GenAI, Vertex AI, Azure OpenAI, Ollama, and Watsonx in Java. All use the same Traced wrapper pattern.

📝
TL;DR
  • Five LLM providers that follow the standard Traced<X>(client) pattern
  • Google GenAI and Vertex AI have countTokens() and chat session support
  • Azure OpenAI traces chat completions, embeddings, and legacy completions
  • Ollama wraps ollama4j, Watsonx uses reflection like Anthropic

Prerequisites

Complete the Java SDK setup first. All providers below need traceai-java-core and TraceAI.init() called before use.


Google GenAI

Wraps the com.google.genai.Client for Google’s Gemini API.

<dependency>
    <groupId>com.github.future-agi.traceAI</groupId>
    <artifactId>traceai-java-google-genai</artifactId>
    <version>main-SNAPSHOT</version>
</dependency>
implementation 'com.github.future-agi.traceAI:traceai-java-google-genai:main-SNAPSHOT'
import ai.traceai.TraceAI;
import ai.traceai.googlegenai.TracedGenerativeModel;
import com.google.genai.Client;

TraceAI.initFromEnvironment();

Client client = Client.builder()
    .apiKey(System.getenv("GOOGLE_API_KEY"))
    .build();

// Note: model name is a constructor parameter
TracedGenerativeModel model = new TracedGenerativeModel(client, "gemini-2.0-flash");

// Simple generation
var response = model.generateContent("What is the capital of France?");
System.out.println(response.text());

// Multi-turn chat
var chat = model.startChat();
var reply = chat.sendMessage("Hello!");
System.out.println(reply.text());

// Token counting
var tokenCount = model.countTokens("How many tokens is this?");

Spans created:

  • generateContent() - “Google GenAI Generate Content” (LLM)
  • chat.sendMessage() - “Google GenAI Chat Message” (LLM)
  • countTokens() - “Google GenAI Count Tokens” (LLM)

Vertex AI

Wraps com.google.cloud.vertexai.generativeai.GenerativeModel for Google Cloud’s Vertex AI.

<dependency>
    <groupId>com.github.future-agi.traceAI</groupId>
    <artifactId>traceai-java-vertexai</artifactId>
    <version>main-SNAPSHOT</version>
</dependency>
implementation 'com.github.future-agi.traceAI:traceai-java-vertexai:main-SNAPSHOT'
import ai.traceai.TraceAI;
import ai.traceai.vertexai.TracedGenerativeModel;
import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.generativeai.GenerativeModel;

TraceAI.initFromEnvironment();

VertexAI vertexAI = new VertexAI("your-project-id", "us-central1");
GenerativeModel nativeModel = new GenerativeModel("gemini-2.0-flash", vertexAI);

TracedGenerativeModel model = new TracedGenerativeModel(nativeModel);

var response = model.generateContent("What is the capital of France?");
System.out.println(response.getCandidatesList().get(0).getContent().getParts(0).getText());

Spans created:

  • generateContent() - “Vertex AI Generate Content” (LLM)
  • countTokens() - “Vertex AI Count Tokens” (LLM)

Note: Vertex AI streaming (generateContentStream) creates a span but ends it before the stream is consumed. Use non-streaming for accurate trace data.


Azure OpenAI

Wraps com.azure.ai.openai.OpenAIClient from the Azure SDK.

<dependency>
    <groupId>com.github.future-agi.traceAI</groupId>
    <artifactId>traceai-java-azure-openai</artifactId>
    <version>main-SNAPSHOT</version>
</dependency>
implementation 'com.github.future-agi.traceAI:traceai-java-azure-openai:main-SNAPSHOT'
import ai.traceai.TraceAI;
import ai.traceai.azure.openai.TracedAzureOpenAIClient;
import com.azure.ai.openai.OpenAIClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.azure.core.credential.AzureKeyCredential;

TraceAI.initFromEnvironment();

OpenAIClient client = new OpenAIClientBuilder()
    .endpoint(System.getenv("AZURE_OPENAI_ENDPOINT"))
    .credential(new AzureKeyCredential(System.getenv("AZURE_OPENAI_API_KEY")))
    .buildClient();

TracedAzureOpenAIClient traced = new TracedAzureOpenAIClient(client);

// Chat completions - first arg is deployment name
var chatOptions = new ChatCompletionsOptions(List.of(
    new ChatRequestUserMessage("What is the capital of France?")
));
var response = traced.getChatCompletions("gpt-4o-mini", chatOptions);
System.out.println(response.getChoices().get(0).getMessage().getContent());

// Embeddings
var embeddingOptions = new EmbeddingsOptions(List.of("Hello world"));
var embeddings = traced.getEmbeddings("text-embedding-3-small", embeddingOptions);

Spans created:

  • getChatCompletions() - “Azure OpenAI Chat Completion” (LLM)
  • getEmbeddings() - “Azure OpenAI Embedding” (EMBEDDING)
  • getCompletions() - “Azure OpenAI Completion” (LLM, legacy API)

Azure OpenAI captures tool call attributes when the model invokes tools, and handles all message types (System, User, Assistant, Tool, Function).


Ollama

Wraps io.github.ollama4j.OllamaAPI for local Ollama models.

<dependency>
    <groupId>com.github.future-agi.traceAI</groupId>
    <artifactId>traceai-java-ollama</artifactId>
    <version>main-SNAPSHOT</version>
</dependency>
implementation 'com.github.future-agi.traceAI:traceai-java-ollama:main-SNAPSHOT'
import ai.traceai.TraceAI;
import ai.traceai.ollama.TracedOllamaAPI;
import io.github.ollama4j.OllamaAPI;

TraceAI.initFromEnvironment();

OllamaAPI api = new OllamaAPI("http://localhost:11434");
TracedOllamaAPI traced = new TracedOllamaAPI(api);

// Generate
var result = traced.generate("llama3", "What is the capital of France?");
System.out.println(result.getResponse());

// Chat
var chatResult = traced.chat("llama3", List.of(
    new OllamaChatMessage("user", "Hello!")
));

// Embeddings
var embedding = traced.embed("llama3", "Hello world");

// List models
var models = traced.listModels();

Spans created:

  • generate() - “Ollama Generate” (LLM)
  • chat() - “Ollama Chat” (LLM)
  • embed() - “Ollama Embed” (EMBEDDING)
  • listModels() - “Ollama List Models” (LLM)

Ollama spans include ollama.response_time_ms from the Ollama server’s own timing.


IBM Watsonx

Wraps the Watsonx Java SDK using reflection (like Anthropic) for cross-version compatibility.

<dependency>
    <groupId>com.github.future-agi.traceAI</groupId>
    <artifactId>traceai-java-watsonx</artifactId>
    <version>main-SNAPSHOT</version>
</dependency>
implementation 'com.github.future-agi.traceAI:traceai-java-watsonx:main-SNAPSHOT'
import ai.traceai.TraceAI;
import ai.traceai.watsonx.TracedWatsonxAI;

TraceAI.initFromEnvironment();

// Create Watsonx client (your SDK version)
Object watsonxClient = /* your Watsonx client */;

// Wraps as Object - reflection-based, version-agnostic
TracedWatsonxAI traced = new TracedWatsonxAI(watsonxClient);

// Text generation
Object response = traced.generateText(textGenRequest);

// Chat
Object chatResponse = traced.chat(chatRequest);

// Embeddings
Object embedResponse = traced.embedText(embedRequest);

Spans created:

  • generateText() - “Watsonx Text Generation” (LLM)
  • chat() - “Watsonx Chat” (LLM)
  • embedText() - “Watsonx Embed” (EMBEDDING)

Watsonx spans include watsonx.project_id, watsonx.space_id, and watsonx.stop_reason.

Like Anthropic, the reflection approach means the client and request objects are typed as Object. Cast the return values to your SDK’s response types.


Common span attributes

All providers above capture these core attributes:

AttributeDescription
llm.providerProvider name (google, azure-openai, ollama, watsonx)
llm.request.modelModel name from the request
llm.response.modelModel name from the response (if different)
llm.token_count.promptInput token count
llm.token_count.completionOutput token count
llm.token_count.totalTotal token count
input.value / output.valuePlain text input/output
fi.raw_input / fi.raw_outputFull request/response as JSON
Was this page helpful?

Questions & Discussion