Chat API

The Chat API is the core endpoint that powers Alfred402's AI conversations.

Endpoint

POST /api/chat

Overview

This endpoint receives user messages, processes them through Google's Gemini AI model with web search capabilities, and streams the response back to the client in real-time.

Configuration

Maximum Duration

export const maxDuration = 50;

The endpoint can run for up to 50 seconds. This aligns with the client-side rate limiting and ensures long-running AI tasks can complete.

Platform limits:

  • Vercel Hobby: 10 seconds max

  • Vercel Pro: 60 seconds max

  • Vercel Enterprise: 900 seconds max

Request Format

Headers

Body

Or with contract address:

Message Structure

Field
Type
Required
Description

messages

Array

Yes

Array of message objects

messages[].role

String

Yes

Either "user" or "assistant"

messages[].content

String

Yes

Message text content

Response Format

The endpoint returns a streaming response using Server-Sent Events (SSE).

Stream Format

Response Events

Event Type
Description

0:

Text content chunks

d:

Metadata (finish reason, etc.)

e:

Error messages

Finish Reasons

  • stop - Normal completion

  • length - Max tokens reached

  • content-filter - Content filtered

  • tool-calls - Tool execution required

AI Model Configuration

Model Selection

Available models:

  • gemini-2.5-flash - Fast, efficient (default)

  • gemini-2.5-pro - More capable, slower

  • gemini-1.5-flash - Previous generation

  • gemini-1.5-pro - Previous generation pro

Parameters

Temperature

Controls response creativity:

  • 0.0-0.3: Focused, deterministic

  • 0.4-0.7: Balanced (recommended)

  • 0.8-1.0: Creative, varied

Max Tokens

Maximum response length:

  • 1000: Brief answers

  • 4000: Detailed analysis (default)

  • 8000: Very comprehensive

Tools Integration

The AI has access to two Google tools:

Enables the AI to:

  • Search for current cryptocurrency prices

  • Find recent news and updates

  • Access DexScreener, CoinGecko data

  • Verify contract addresses

2. URL Context

Enables the AI to:

  • Fetch and analyze specific URLs

  • Extract data from blockchain explorers

  • Read token information from DEX platforms

System Prompt

The endpoint includes a comprehensive system prompt that defines:

  1. Identity: "Alfred402" cryptocurrency oracle

  2. Capabilities: Web search, token analysis, risk assessment

  3. Security directives: Prompt injection protection

  4. Personality traits: Wise, data-driven, cautious

  5. Instructions: How to analyze tokens and cite sources

See AI System Prompt for full details.

Error Handling

Common Errors

400 Bad Request

Cause: Missing or malformed messages array

401 Unauthorized

Cause: Missing GOOGLE_GENERATIVE_AI_API_KEY environment variable

429 Too Many Requests

Cause: Too many requests to Google AI API

500 Internal Server Error

Cause: Gemini API error or network issue

Error Response Format

Errors are returned as JSON:

Example Usage

Using Fetch API

Performance Considerations

Streaming Benefits

  • Faster perceived performance: Users see responses as they generate

  • Better UX: No long waits for complete responses

  • Efficient: Reduces memory usage on server

Response Times

Typical response times:

  • Simple queries: 2-5 seconds

  • With web search: 5-15 seconds

  • Complex analysis: 10-30 seconds

Optimization Tips

  1. Use appropriate max tokens: Don't request more than needed

  2. Adjust temperature: Lower = faster, higher = more thorough

  3. Enable tool use selectively: Tools add latency

  4. Implement client-side caching: Cache common queries

Security Features

Prompt Injection Protection

The system prompt includes multiple layers of defense:

Rate Limiting

  • Client-side: 50-second cooldown

  • Server-side: maxDuration limit

  • API-side: Google AI quota limits

Input Validation

The endpoint validates:

  • Request structure

  • Message format

  • Content safety

Monitoring

Track these metrics in production:

  • Request count

  • Average response time

  • Error rate

  • Token usage

  • Tool invocation frequency

Logging

Add logging for:

Cost Optimization

Gemini API Pricing

  • Free tier: 15 requests/minute

  • Paid tier: Higher limits, lower latency

Reducing Costs

  1. Lower max tokens: Use 2000 instead of 4000

  2. Implement caching: Cache frequent queries

  3. Use Flash model: Cheaper than Pro

  4. Rate limit users: Current 50s cooldown helps

Testing

Manual Testing

Automated Testing

Consider testing:

  • Valid request handling

  • Error responses

  • Streaming functionality

  • Tool invocations


Need help? Check Troubleshooting or open an issue.

Last updated