ChatGPT-Discord-Bot/docs/TOKEN_COUNTING_GUIDE.md

# Token Counting Guide

## Overview

This bot implements comprehensive token counting for both text and images, with special handling for Discord image links stored in MongoDB with 24-hour expiration.

## Token Encoding by Model

### o200k_base (200k vocabulary) - Newer Models
Used for:
- ✅ **gpt-4o** and **gpt-4o-mini**
- ✅ **gpt-4.1**, **gpt-4.1-mini**, **gpt-4.1-nano** (NEW!)
- ✅ **gpt-5**, **gpt-5-mini**, **gpt-5-nano**, **gpt-5-chat**
- ✅ **o1**, **o1-mini**, **o1-preview**
- ✅ **o3**, **o3-mini**
- ✅ **o4**, **o4-mini**

### cl100k_base (100k vocabulary) - Older Models
Used for:
- ✅ **gpt-4** (original, not 4o or 4.1)
- ✅ **gpt-3.5-turbo**

## Token Counting Features

### 1. Text Token Counting
```python
from src.utils.token_counter import token_counter

# Count text tokens
tokens = token_counter.count_text_tokens("Hello, world!", "openai/gpt-4o")
print(f"Text uses {tokens} tokens")
```

### 2. Image Token Counting

Images consume tokens based on their dimensions and detail level:

#### Low Detail
- **85 tokens** (fixed cost)

#### High Detail
- **Base cost**: 170 tokens
- **Tile cost**: 170 tokens per 512x512 tile
- Images are scaled to fit 2048x2048
- Shortest side scaled to 768px
- Divided into 512x512 tiles

```python
# Count image tokens from Discord URL
tokens = await token_counter.count_image_tokens(
    image_url="https://cdn.discordapp.com/attachments/...",
    detail="auto"
)
print(f"Image uses {tokens} tokens")

# Count image tokens from bytes
with open("image.png", "rb") as f:
    image_data = f.read()
tokens = await token_counter.count_image_tokens(
    image_data=image_data,
    detail="high"
)
```

### 3. Message Token Counting

Count tokens for complete message arrays including text and images:

```python
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
]

token_counts = await token_counter.count_message_tokens(messages, "openai/gpt-4o")
print(f"Total: {token_counts['total_tokens']} tokens")
print(f"Text: {token_counts['text_tokens']} tokens")
print(f"Images: {token_counts['image_tokens']} tokens")
```

### 4. Context Limit Checking

Check if messages fit within model's context window:

```python
context_check = await token_counter.check_context_limit(
    messages=messages,
    model="openai/gpt-4o",
    max_output_tokens=4096
)

if not context_check["within_limit"]:
    print(f"⚠️ Messages too large: {context_check['input_tokens']} tokens")
    print(f"Maximum: {context_check['max_tokens']} tokens")
else:
    print(f"✅ Within limit. Available for output: {context_check['available_output_tokens']} tokens")
```

## Discord Image Handling

### Image Storage in MongoDB

When users send images in Discord:

1. **Image URL Captured**: Discord CDN URL is stored
2. **Timestamp Added**: Current datetime is recorded
3. **Saved to History**: Stored in message content array

```python
content = [
    {"type": "text", "text": "Look at this image"},
    {
        "type": "image_url",
        "image_url": {
            "url": "https://cdn.discordapp.com/attachments/...",
            "detail": "auto"
        },
        "timestamp": "2025-10-01T12:00:00"  # Added automatically
    }
]
```

### 24-Hour Expiration

Discord CDN links expire after ~24 hours. The system:

1. **Filters Expired Images**: When loading history, images older than 23 hours are removed
2. **Token Counting Skips Expired**: Token counter checks timestamps and skips expired images
3. **Automatic Cleanup**: Database handler filters expired images on every `get_history()` call

```python
# In db_handler.py
def _filter_expired_images(self, history: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """Filter out image links that are older than 23 hours"""
    current_time = datetime.now()
    expiration_time = current_time - timedelta(hours=23)

    # Checks timestamp and removes expired images
    # ...
```

### Token Counter Expiration Handling

The token counter automatically skips expired images:

```python
# In token_counter.py count_message_tokens()
timestamp_str = part.get("timestamp")
if timestamp_str:
    timestamp = datetime.fromisoformat(timestamp_str)
    if timestamp <= expiration_time:
        logging.info(f"Skipping expired image (added at {timestamp_str})")
        continue  # Don't count tokens for expired images
```

## Cost Estimation

Calculate costs based on token usage:

```python
cost = token_counter.estimate_cost(
    input_tokens=1000,
    output_tokens=500,
    model="openai/gpt-4o"
)
print(f"Estimated cost: ${cost:.6f}")
```

### Model Pricing (per 1M tokens)

| Model | Input | Output |
|-------|-------|--------|
| gpt-4o | $5.00 | $20.00 |
| gpt-4o-mini | $0.60 | $2.40 |
| gpt-4.1 | $2.00 | $8.00 |
| gpt-4.1-mini | $0.40 | $1.60 |
| gpt-4.1-nano | $0.10 | $0.40 |
| gpt-5 | $1.25 | $10.00 |
| gpt-5-mini | $0.25 | $2.00 |
| gpt-5-nano | $0.05 | $0.40 |
| o1-preview | $15.00 | $60.00 |
| o1-mini | $1.10 | $4.40 |

## Database Token Tracking

### Save Token Usage

```python
await db_handler.save_token_usage(
    user_id=user_id,
    model="openai/gpt-4o",
    input_tokens=1000,
    output_tokens=500,
    cost=0.0125,
    text_tokens=950,
    image_tokens=50
)
```

### Get User Statistics

```python
# Get total usage
stats = await db_handler.get_user_token_usage(user_id)
print(f"Total input: {stats['total_input_tokens']}")
print(f"Total text: {stats['total_text_tokens']}")
print(f"Total images: {stats['total_image_tokens']}")
print(f"Total cost: ${stats['total_cost']:.6f}")

# Get usage by model
model_usage = await db_handler.get_user_token_usage_by_model(user_id)
for model, usage in model_usage.items():
    print(f"{model}: {usage['requests']} requests, ${usage['cost']:.6f}")
    print(f"  Text: {usage['text_tokens']}, Images: {usage['image_tokens']}")
```

## Integration Example

Complete example of using token counting in a command:

```python
from src.utils.token_counter import token_counter

async def process_user_message(interaction, user_message, image_urls=None):
    user_id = interaction.user.id
    model = await db_handler.get_user_model(user_id) or DEFAULT_MODEL
    history = await db_handler.get_history(user_id)

    # Build message content
    content = [{"type": "text", "text": user_message}]

    # Add images with timestamps
    if image_urls:
        for url in image_urls:
            content.append({
                "type": "image_url",
                "image_url": {"url": url, "detail": "auto"},
                "timestamp": datetime.now().isoformat()
            })

    # Add to messages
    messages = history + [{"role": "user", "content": content}]

    # Check context limit
    context_check = await token_counter.check_context_limit(messages, model)
    if not context_check["within_limit"]:
        await interaction.followup.send(
            f"⚠️ Context too large: {context_check['input_tokens']:,} tokens. "
            f"Maximum: {context_check['max_tokens']:,} tokens.",
            ephemeral=True
        )
        return

    # Count input tokens
    input_count = await token_counter.count_message_tokens(messages, model)

    # Call API
    response = await openai_client.chat.completions.create(
        model=model,
        messages=messages
    )

    reply = response.choices[0].message.content

    # Get actual usage from API
    usage = response.usage
    actual_input = usage.prompt_tokens if usage else input_count['total_tokens']
    actual_output = usage.completion_tokens if usage else token_counter.count_text_tokens(reply, model)

    # Calculate cost
    cost = token_counter.estimate_cost(actual_input, actual_output, model)

    # Save to database
    await db_handler.save_token_usage(
        user_id=user_id,
        model=model,
        input_tokens=actual_input,
        output_tokens=actual_output,
        cost=cost,
        text_tokens=input_count['text_tokens'],
        image_tokens=input_count['image_tokens']
    )

    # Send response with cost
    await interaction.followup.send(f"{reply}\n\n💰 Cost: ${cost:.6f}")
```

## Best Practices

### 1. Always Check Context Limits
Before making API calls, check if the messages fit within the model's context window.

### 2. Add Timestamps to Images
When storing images from Discord, always add a timestamp:
```python
"timestamp": datetime.now().isoformat()
```

### 3. Filter History on Load
The database handler automatically filters expired images when loading history.

### 4. Count Before API Call
Count tokens before calling the API to provide accurate estimates and warnings.

### 5. Use Actual Usage from API
Prefer `response.usage` over estimates when available:
```python
actual_input = usage.prompt_tokens if usage else estimated_tokens
```

### 6. Track Text and Image Separately
Store both text_tokens and image_tokens for detailed analytics.

### 7. Show Cost to Users
Always display the cost after operations so users are aware of usage.

## Context Window Limits

| Model | Context Limit |
|-------|--------------|
| gpt-4o | 128,000 tokens |
| gpt-4o-mini | 128,000 tokens |
| gpt-4.1 | 128,000 tokens |
| gpt-4.1-mini | 128,000 tokens |
| gpt-4.1-nano | 128,000 tokens |
| gpt-5 | 200,000 tokens |
| gpt-5-mini | 200,000 tokens |
| gpt-5-nano | 200,000 tokens |
| o1 | 200,000 tokens |
| o1-mini | 128,000 tokens |
| o3 | 200,000 tokens |
| o3-mini | 200,000 tokens |
| gpt-4 | 8,192 tokens |
| gpt-3.5-turbo | 16,385 tokens |

## Troubleshooting

### Image Token Count Seems Wrong
- Check if image was downloaded successfully
- Verify image dimensions
- Remember: high detail images use tile-based calculation

### Expired Images Still Counted
- Check that timestamps are in ISO format
- Verify expiration threshold (23 hours)
- Ensure `_filter_expired_images()` is called

### Cost Calculation Incorrect
- Verify model name matches MODEL_PRICING keys exactly
- Check that pricing is per 1M tokens
- Ensure input/output tokens are correct

### Context Limit Exceeded
- Trim conversation history (keep last N messages)
- Reduce image detail level to "low"
- Remove old images from history
- Use a model with larger context window

## Cleanup

Don't forget to close the token counter session when shutting down:

```python
await token_counter.close()
```

This is typically done in the bot's cleanup/shutdown handler.