Files

cauvang32 9c180bdd89 Refactor OpenAI utilities and remove Python executor

- Removed the `analyze_data_file` function from tool definitions to streamline functionality.
- Enhanced the `execute_python_code` function description to clarify auto-installation of packages and file handling.
- Deleted the `python_executor.py` module to simplify the codebase and improve maintainability.
- Introduced a new `token_counter.py` module for efficient token counting for OpenAI API requests, including support for Discord image links and cost estimation.

2025-10-02 21:49:48 +07:00

10 KiB

Raw Permalink Blame History

Model Instructions - Code Interpreter Usage

🎯 Overview

This document explains how the AI model should use the code interpreter tool to ensure packages are automatically installed and files are properly managed.

📦 Package Auto-Installation

✅ What the Model SHOULD Do

Just import packages normally - they auto-install if missing!

# CORRECT - Just import what you need
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Even specialized libraries
import tensorflow as tf
import torch
import geopandas as gpd
import opencv as cv2

❌ What the Model SHOULD NOT Do

Don't check if packages are installed or ask users to install them:

# WRONG - Don't do this!
try:
    import seaborn
except ImportError:
    print("Please install seaborn")

# WRONG - Don't do this!
import subprocess
subprocess.run(['pip', 'install', 'seaborn'])

# WRONG - Don't do this!
print("First, install pandas: pip install pandas")

🔧 How Auto-Install Works

Behind the Scenes:

Model writes code: import seaborn as sns
Code executes → ModuleNotFoundError detected
System auto-installs: pip install seaborn
Code re-executes automatically → Success!
User gets notification: "📦 Auto-installed: seaborn"

No Action Required from Model

The model doesn't need to:

Check if packages are installed
Use install_packages parameter
Handle installation errors
Retry code execution

Everything is automatic!

📁 File Management

Loading User Files

When users upload files, they get a file_id:

# User uploaded "sales_data.csv" → file_id: "123456789_1696118400_abc123"

# Model's code:
import pandas as pd

# Load the file
df = load_file('123456789_1696118400_abc123')

print(f"Loaded {len(df)} rows")
print(df.head())

Creating Output Files

ANY file the model creates is captured and sent to the user:

import pandas as pd
import matplotlib.pyplot as plt
import json

# Create CSV export
df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
df.to_csv('results.csv', index=False)  # ✅ User gets this!

# Create visualization
plt.figure(figsize=(10, 6))
plt.plot(df['x'], df['y'])
plt.title('Results')
plt.savefig('plot.png')  # ✅ User gets this!

# Create JSON report
summary = {'total': 6, 'mean': 3.5}
with open('summary.json', 'w') as f:
    json.dump(summary, f, indent=2)  # ✅ User gets this!

# Create text report
with open('report.txt', 'w') as f:
    f.write('Analysis Results\n')
    f.write('================\n')
    f.write(f'Total: {summary["total"]}\n')  # ✅ User gets this!

print('Generated 4 files: CSV, PNG, JSON, TXT')

Supported Output Files (80+ formats)

✅ Data: CSV, Excel, Parquet, JSON, XML, YAML
✅ Images: PNG, JPEG, GIF, SVG, BMP, TIFF
✅ Text: TXT, MD, LOG, HTML
✅ Code: Python, JavaScript, SQL, R
✅ Scientific: NumPy (.npy), Pickle, HDF5
✅ Archives: ZIP, TAR, GZIP

💡 Best Practices for the Model

1. Don't Over-Explain Package Installation

❌ BAD:

I'll use seaborn for visualization. First, let me check if it's installed...
<execute code with try/except>

✅ GOOD:

I'll create a correlation heatmap using seaborn.
<execute code with import seaborn>

2. Create Files Instead of Printing Long Output

❌ BAD:

# Don't print entire dataframes
print(df.to_string())  # May get truncated!

✅ GOOD:

# Save as file instead
df.to_csv('full_data.csv', index=False)
print(f"Saved {len(df)} rows to full_data.csv")

3. Use Descriptive Filenames

❌ BAD:

plt.savefig('1.png')
df.to_csv('output.csv')

✅ GOOD:

plt.savefig('sales_trend_2024.png')
df.to_csv('cleaned_customer_data.csv')

4. Generate Multiple Output Types

✅ EXCELLENT:

# Give users data in multiple formats
df.to_csv('data.csv')           # For Excel users
df.to_json('data.json')         # For developers
df.to_parquet('data.parquet')   # For data scientists

# Create visualization
plt.savefig('chart.png')        # For viewing

# Create summary report
with open('summary.txt', 'w') as f:
    f.write('Analysis Summary\n')
    f.write(df.describe().to_string())

5. Handle Errors Gracefully

✅ GOOD:

try:
    df = load_file('file_id_here')
    # Process data
    df.to_csv('results.csv')
except Exception as e:
    print(f"Error: {e}")
    # Provide helpful message to user

📊 Complete Example: Data Analysis

User Request:

"Analyze this CSV file and show me sales trends"

Model's Approach:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns  # Auto-installs if needed!
import json

# Load user's uploaded file
df = load_file('user_file_id_123')

# 1. Basic exploration
print(f"Dataset: {len(df)} rows × {len(df.columns)} columns")
print(f"\nColumns: {', '.join(df.columns)}")
print(f"\nFirst few rows:")
print(df.head())

# 2. Save summary statistics as JSON
summary = {
    'total_rows': len(df),
    'columns': df.columns.tolist(),
    'numeric_summary': df.describe().to_dict(),
    'date_range': {
        'start': df['date'].min(),
        'end': df['date'].max()
    } if 'date' in df.columns else None
}

with open('summary_statistics.json', 'w') as f:
    json.dump(summary, f, indent=2)

# 3. Create visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Sales trend over time
if 'date' in df.columns and 'sales' in df.columns:
    df['date'] = pd.to_datetime(df['date'])
    df = df.sort_values('date')
    axes[0, 0].plot(df['date'], df['sales'])
    axes[0, 0].set_title('Sales Trend Over Time')
    axes[0, 0].set_xlabel('Date')
    axes[0, 0].set_ylabel('Sales ($)')
    axes[0, 0].grid(True)

# Distribution
df['sales'].hist(bins=30, ax=axes[0, 1])
axes[0, 1].set_title('Sales Distribution')
axes[0, 1].set_xlabel('Sales ($)')
axes[0, 1].set_ylabel('Frequency')

# Box plot
df.boxplot(column='sales', by='category', ax=axes[1, 0])
axes[1, 0].set_title('Sales by Category')
axes[1, 0].set_xlabel('Category')
axes[1, 0].set_ylabel('Sales ($)')

# Top products
top_products = df.groupby('product')['sales'].sum().nlargest(10)
axes[1, 1].barh(top_products.index, top_products.values)
axes[1, 1].set_title('Top 10 Products by Sales')
axes[1, 1].set_xlabel('Total Sales ($)')

plt.tight_layout()
plt.savefig('sales_analysis.png', dpi=150)

# 4. Export cleaned data
df_cleaned = df.dropna()
df_cleaned.to_csv('cleaned_sales_data.csv', index=False)

# 5. Generate text report
with open('analysis_report.txt', 'w') as f:
    f.write('SALES ANALYSIS REPORT\n')
    f.write('=' * 70 + '\n\n')
    f.write(f'Dataset Size: {len(df)} rows × {len(df.columns)} columns\n')
    f.write(f'Date Range: {summary["date_range"]["start"]} to {summary["date_range"]["end"]}\n\n')
    f.write('Summary Statistics:\n')
    f.write('-' * 70 + '\n')
    f.write(df['sales'].describe().to_string())
    f.write('\n\n')
    f.write('Top 5 Products:\n')
    f.write('-' * 70 + '\n')
    f.write(top_products.head().to_string())

print("\n✅ Analysis complete! Generated 4 files:")
print("1. summary_statistics.json - Detailed statistics")
print("2. sales_analysis.png - Visualizations")
print("3. cleaned_sales_data.csv - Cleaned dataset")
print("4. analysis_report.txt - Full text report")

What the User Receives:

✅ Execution succeeded!

Dataset: 365 rows × 5 columns
Columns: date, product, category, sales, quantity
[... output ...]

✅ Analysis complete! Generated 4 files:
1. summary_statistics.json - Detailed statistics
2. sales_analysis.png - Visualizations
3. cleaned_sales_data.csv - Cleaned dataset
4. analysis_report.txt - Full text report

📎 Generated 4 file(s):
• summary_statistics.json (structured, 2.1 KB)
• sales_analysis.png (image, 145.2 KB)
• cleaned_sales_data.csv (data, 45.6 KB)
• analysis_report.txt (text, 3.2 KB)

[4 downloadable file attachments in Discord]

⏱️ Executed in 3.45s
📦 Auto-installed: seaborn

🚫 Common Model Mistakes

Mistake #1: Checking Package Availability

❌ DON'T:

import sys
if 'seaborn' not in sys.modules:
    print("Seaborn is not installed")

✅ DO:

import seaborn as sns  # Just import it!

Mistake #2: Using install_packages Parameter

❌ DON'T:

{
  "code": "import pandas as pd",
  "install_packages": ["pandas"]  // Unnecessary!
}

✅ DO:

{
  "code": "import pandas as pd"  // That's it!
}

Mistake #3: Printing Instead of Saving

❌ DON'T:

print(df.to_string())  // Output gets truncated!

✅ DO:

df.to_csv('data.csv')  // User gets full data!

Mistake #4: Not Using load_file()

❌ DON'T:

df = pd.read_csv('/path/to/file.csv')  // Won't work!

✅ DO:

df = load_file('file_id_from_user')  // Correct!

✅ Checklist for Model Developers

When updating the model's behavior:

Model knows packages auto-install (no manual checks)
Model uses load_file() for user uploads
Model creates files instead of printing long output
Model uses descriptive filenames
Model handles errors gracefully
Model generates multiple output types when useful
Tool description emphasizes auto-install feature
System prompt includes code interpreter capabilities
Examples show correct usage patterns

GENERATED_FILES_GUIDE.md - Complete file handling guide
CODE_INTERPRETER_GUIDE.md - Technical implementation details
NEW_FEATURES_GUIDE.md - All new features overview
code_interpreter_prompts.py - System prompt definitions

🎉 Summary

Key Message to the Model:

"Just write Python code normally. Import any approved package - it auto-installs if missing. Create files (CSV, images, reports) - they're automatically sent to users. Use load_file('file_id') to access user uploads. That's it!"

What the Model Should Remember:

✅ Auto-install is automatic - just import packages
✅ All files are captured - create files, don't print
✅ Use load_file() - for user uploads
✅ Be descriptive - good filenames help users
✅ Handle errors - gracefully inform users

The system handles everything else automatically! 🚀

10 KiB Raw Permalink Blame History Unescape Escape

Model Instructions - Code Interpreter Usage

🎯 Overview

📦 Package Auto-Installation

✅ What the Model SHOULD Do

❌ What the Model SHOULD NOT Do

🔧 How Auto-Install Works

Behind the Scenes:

No Action Required from Model

📁 File Management

Loading User Files

Creating Output Files

Supported Output Files (80+ formats)

💡 Best Practices for the Model

1. Don't Over-Explain Package Installation

2. Create Files Instead of Printing Long Output

3. Use Descriptive Filenames

4. Generate Multiple Output Types

5. Handle Errors Gracefully

📊 Complete Example: Data Analysis

User Request:

Model's Approach:

What the User Receives:

🚫 Common Model Mistakes

Mistake #1: Checking Package Availability

Mistake #2: Using install_packages Parameter

Mistake #3: Printing Instead of Saving

Mistake #4: Not Using load_file()

✅ Checklist for Model Developers

📚 Related Documentation

🎉 Summary

10 KiB

Raw Permalink Blame History