- Removed the `analyze_data_file` function from tool definitions to streamline functionality. - Enhanced the `execute_python_code` function description to clarify auto-installation of packages and file handling. - Deleted the `python_executor.py` module to simplify the codebase and improve maintainability. - Introduced a new `token_counter.py` module for efficient token counting for OpenAI API requests, including support for Discord image links and cost estimation.
12 KiB
Code Interpreter Guide
Overview
The unified code interpreter provides ChatGPT/Claude-style code execution capabilities:
- Secure Python execution in isolated virtual environments
- File management with automatic 48-hour expiration
- Data analysis with pandas, numpy, matplotlib, seaborn, plotly
- Package installation with security validation
- Visualization generation with automatic image handling
Features
1. Code Execution
Execute arbitrary Python code securely:
from src.utils.code_interpreter import execute_code
result = await execute_code(
code="print('Hello, world!')",
user_id=123456789
)
# Result:
# {
# "success": True,
# "output": "Hello, world!\n",
# "error": "",
# "execution_time": 0.05,
# "return_code": 0
# }
2. File Upload & Management
Upload files for code to access:
from src.utils.code_interpreter import upload_file, list_user_files
# Upload a CSV file
with open('data.csv', 'rb') as f:
result = await upload_file(
user_id=123456789,
file_data=f.read(),
filename='data.csv',
file_type='csv',
db_handler=db
)
file_id = result['file_id']
# List user's files
files = await list_user_files(user_id=123456789, db_handler=db)
3. Code with File Access
Access uploaded files in code:
# Upload a CSV file first
result = await upload_file(user_id=123, file_data=csv_bytes, filename='sales.csv')
file_id = result['file_id']
# Execute code that uses the file
code = """
# load_file() is automatically available
df = load_file('""" + file_id + """')
print(df.head())
print(f"Total rows: {len(df)}")
"""
result = await execute_code(
code=code,
user_id=123,
user_files=[file_id],
db_handler=db
)
4. Package Installation
Install approved packages on-demand:
result = await execute_code(
code="""
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
plt.figure(figsize=(10, 6))
sns.scatterplot(data=tips, x='total_bill', y='tip')
plt.savefig('plot.png')
print('Plot saved!')
""",
user_id=123,
install_packages=['seaborn', 'matplotlib']
)
5. Data Analysis
Automatic data loading and analysis:
# The load_file() helper automatically detects file types
code = """
# Load CSV
df = load_file('file_id_here')
# Basic analysis
print(f"Shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
print(df.describe())
# Correlation analysis
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.savefig('correlation.png')
"""
result = await execute_code(code=code, user_id=123, user_files=['file_id_here'])
# Visualizations are returned in result['generated_files']
for file in result.get('generated_files', []):
print(f"Generated: {file['filename']}")
# file['data'] contains the image bytes
File Expiration
Automatic Cleanup (48 Hours)
Files automatically expire after 48 hours:
from src.utils.code_interpreter import cleanup_expired_files
# Run cleanup (should be scheduled periodically)
deleted_count = await cleanup_expired_files(db_handler=db)
print(f"Cleaned up {deleted_count} expired files")
Manual File Deletion
Delete files manually:
from src.utils.code_interpreter import delete_user_file
success = await delete_user_file(
file_id='user_123_1234567890_abc123',
user_id=123,
db_handler=db
)
Security Features
Approved Packages
Only approved packages can be installed:
- Data Science: numpy, pandas, scipy, scikit-learn, statsmodels
- Visualization: matplotlib, seaborn, plotly, bokeh, altair
- Image Processing: pillow, imageio, scikit-image
- Machine Learning: tensorflow, keras, torch, xgboost, lightgbm
- NLP: nltk, spacy, gensim, wordcloud
- Math/Science: sympy, networkx, numba
Blocked Operations
Code is validated against dangerous operations:
- ❌ File system writes (outside execution dir)
- ❌ Network operations (socket, requests, urllib)
- ❌ Process spawning (subprocess)
- ❌ System access (os.system, eval, exec)
- ❌ Dangerous functions (import, globals, locals)
Execution Limits
- Timeout: 60 seconds (configurable)
- Output Size: 100KB max (truncated if larger)
- File Size: 50MB max per file
Environment Management
Persistent Virtual Environment
The code interpreter uses a persistent venv:
- Location:
/tmp/bot_code_interpreter/venv - Cleanup: Automatically recreated every 7 days
- Packages: Cached and reused across executions
Status Check
Get interpreter status:
from src.utils.code_interpreter import get_interpreter_status
status = await get_interpreter_status(db_handler=db)
# Returns:
# {
# "venv_exists": True,
# "python_path": "/tmp/bot_code_interpreter/venv/bin/python",
# "installed_packages": ["numpy", "pandas", "matplotlib", ...],
# "package_count": 15,
# "last_cleanup": "2024-01-15T10:30:00",
# "total_user_files": 42,
# "total_file_size_mb": 125.5,
# "file_expiration_hours": 48,
# "max_file_size_mb": 50
# }
Database Schema
user_files Collection
{
"file_id": "user_123_1234567890_abc123",
"user_id": 123456789,
"filename": "sales_data.csv",
"file_path": "/tmp/bot_code_interpreter/user_files/123456789/user_123_1234567890_abc123.csv",
"file_size": 1024000,
"file_type": "csv",
"uploaded_at": "2024-01-15T10:30:00",
"expires_at": "2024-01-17T10:30:00" // 48 hours later
}
Indexes
Automatically created for performance:
# Compound index for user queries
await db.user_files.create_index([("user_id", 1), ("expires_at", -1)])
# Unique index for file lookups
await db.user_files.create_index("file_id", unique=True)
# Index for cleanup queries
await db.user_files.create_index("expires_at")
Integration Example
Complete example integrating code interpreter:
from src.utils.code_interpreter import (
execute_code,
upload_file,
list_user_files,
cleanup_expired_files
)
async def handle_user_request(user_id: int, code: str, files: list, db):
"""Handle a code execution request from a user."""
# Upload any files the user provided
uploaded_files = []
for file_data, filename in files:
result = await upload_file(
user_id=user_id,
file_data=file_data,
filename=filename,
db_handler=db
)
if result['success']:
uploaded_files.append(result['file_id'])
# Execute the code with file access
result = await execute_code(
code=code,
user_id=user_id,
user_files=uploaded_files,
install_packages=['pandas', 'matplotlib'],
timeout=60,
db_handler=db
)
# Check for errors
if not result['success']:
return f"❌ Error: {result['error']}"
# Format output
response = f"✅ Execution completed in {result['execution_time']:.2f}s\n\n"
if result['output']:
response += f"**Output:**\n```\n{result['output']}\n```\n"
# Handle generated images
for file in result.get('generated_files', []):
if file['type'] == 'image':
response += f"\n📊 Generated: {file['filename']}\n"
# file['data'] contains image bytes - save or send to Discord
return response
# Periodic cleanup (run every hour)
async def scheduled_cleanup(db):
"""Clean up expired files."""
deleted = await cleanup_expired_files(db_handler=db)
if deleted > 0:
logging.info(f"Cleaned up {deleted} expired files")
Error Handling
Common Errors
Security Validation Failed
result = {
"success": False,
"error": "Security validation failed: Blocked unsafe operation: import\s+subprocess"
}
Timeout
result = {
"success": False,
"error": "Execution timeout after 60 seconds",
"execution_time": 60,
"return_code": -1
}
Package Not Approved
result = {
"success": False,
"error": "Package 'requests' is not in the approved list"
}
File Too Large
result = {
"success": False,
"error": "File too large. Maximum size is 50MB"
}
Best Practices
- Always provide db_handler for file management
- Set reasonable timeouts for long-running code
- Handle generated_files in results (images, etc.)
- Run cleanup_expired_files() periodically (hourly recommended)
- Validate user input before passing to execute_code()
- Check result['success'] before using output
- Display execution_time to users for transparency
Architecture
Components
- FileManager: Handles file upload/download, expiration, cleanup
- PackageManager: Manages venv, installs packages, caches installations
- CodeExecutor: Executes code securely, provides file access helpers
Execution Flow
User Code Request
↓
Security Validation (blocked patterns)
↓
Ensure venv Ready (create if needed)
↓
Install Packages (if requested)
↓
Create Temp Execution Dir
↓
Inject File Access Helpers (load_file, FILES dict)
↓
Execute Code (isolated subprocess)
↓
Collect Output + Generated Files
↓
Cleanup Temp Dir
↓
Return Results
Comparison to Old System
Old System (3 separate files)
code_interpreter.py- Router/dispatcherpython_executor.py- Execution logicdata_analyzer.py- Data analysis templates
New System (1 unified file)
- ✅ All functionality in
code_interpreter.py - ✅ 48-hour file expiration (like images)
- ✅ Persistent venv with package caching
- ✅ Better security validation
- ✅ Automatic data loading helpers
- ✅ Unified API with async/await
- ✅ MongoDB integration for file tracking
- ✅ Automatic cleanup scheduling
Troubleshooting
Venv Creation Fails
Check disk space and permissions:
df -h /tmp
ls -la /tmp/bot_code_interpreter
Packages Won't Install
Check if package is approved:
from src.utils.code_interpreter import get_package_manager
pm = get_package_manager()
is_approved, reason = pm.is_package_approved('package_name')
print(f"Approved: {is_approved}, Reason: {reason}")
Files Not Found
Check expiration:
from src.utils.code_interpreter import get_file_manager
fm = get_file_manager(db_handler=db)
file_meta = await fm.get_file(file_id, user_id)
if not file_meta:
print("File expired or doesn't exist")
else:
print(f"Expires at: {file_meta['expires_at']}")
Performance Issues
Check status and cleanup:
status = await get_interpreter_status(db_handler=db)
print(f"Total files: {status['total_user_files']}")
print(f"Total size: {status['total_file_size_mb']} MB")
# Force cleanup
deleted = await cleanup_expired_files(db_handler=db)
print(f"Cleaned up: {deleted} files")
Migration from Old System
If migrating from the old 3-file system:
-
Replace imports:
# Old from src.utils.python_executor import execute_python_code from src.utils.data_analyzer import analyze_data_file # New from src.utils.code_interpreter import execute_code -
Update function calls:
# Old result = await execute_python_code({ "code": code, "user_id": user_id }) # New result = await execute_code( code=code, user_id=user_id, db_handler=db ) -
Handle file uploads:
# New file handling result = await upload_file( user_id=user_id, file_data=bytes, filename=name, db_handler=db ) -
Schedule cleanup:
# Add to bot startup @tasks.loop(hours=1) async def cleanup_task(): await cleanup_expired_files(db_handler=db)
Summary
The unified code interpreter provides:
- 🔒 Security: Validated patterns, approved packages only
- ⏱️ Expiration: Automatic 48-hour file cleanup
- 📦 Packages: Persistent venv with caching
- 📊 Analysis: Built-in data loading helpers
- 🎨 Visualizations: Automatic image generation handling
- 🔄 Integration: Clean async API with MongoDB
- 📈 Status: Real-time monitoring and metrics
All in one file: src/utils/code_interpreter.py