- Removed the `analyze_data_file` function from tool definitions to streamline functionality. - Enhanced the `execute_python_code` function description to clarify auto-installation of packages and file handling. - Deleted the `python_executor.py` module to simplify the codebase and improve maintainability. - Introduced a new `token_counter.py` module for efficient token counting for OpenAI API requests, including support for Discord image links and cost estimation.
531 lines
12 KiB
Markdown
531 lines
12 KiB
Markdown
# Code Interpreter Guide
|
|
|
|
## Overview
|
|
|
|
The unified code interpreter provides ChatGPT/Claude-style code execution capabilities:
|
|
|
|
- **Secure Python execution** in isolated virtual environments
|
|
- **File management** with automatic 48-hour expiration
|
|
- **Data analysis** with pandas, numpy, matplotlib, seaborn, plotly
|
|
- **Package installation** with security validation
|
|
- **Visualization generation** with automatic image handling
|
|
|
|
## Features
|
|
|
|
### 1. Code Execution
|
|
|
|
Execute arbitrary Python code securely:
|
|
|
|
```python
|
|
from src.utils.code_interpreter import execute_code
|
|
|
|
result = await execute_code(
|
|
code="print('Hello, world!')",
|
|
user_id=123456789
|
|
)
|
|
|
|
# Result:
|
|
# {
|
|
# "success": True,
|
|
# "output": "Hello, world!\n",
|
|
# "error": "",
|
|
# "execution_time": 0.05,
|
|
# "return_code": 0
|
|
# }
|
|
```
|
|
|
|
### 2. File Upload & Management
|
|
|
|
Upload files for code to access:
|
|
|
|
```python
|
|
from src.utils.code_interpreter import upload_file, list_user_files
|
|
|
|
# Upload a CSV file
|
|
with open('data.csv', 'rb') as f:
|
|
result = await upload_file(
|
|
user_id=123456789,
|
|
file_data=f.read(),
|
|
filename='data.csv',
|
|
file_type='csv',
|
|
db_handler=db
|
|
)
|
|
|
|
file_id = result['file_id']
|
|
|
|
# List user's files
|
|
files = await list_user_files(user_id=123456789, db_handler=db)
|
|
```
|
|
|
|
### 3. Code with File Access
|
|
|
|
Access uploaded files in code:
|
|
|
|
```python
|
|
# Upload a CSV file first
|
|
result = await upload_file(user_id=123, file_data=csv_bytes, filename='sales.csv')
|
|
file_id = result['file_id']
|
|
|
|
# Execute code that uses the file
|
|
code = """
|
|
# load_file() is automatically available
|
|
df = load_file('""" + file_id + """')
|
|
print(df.head())
|
|
print(f"Total rows: {len(df)}")
|
|
"""
|
|
|
|
result = await execute_code(
|
|
code=code,
|
|
user_id=123,
|
|
user_files=[file_id],
|
|
db_handler=db
|
|
)
|
|
```
|
|
|
|
### 4. Package Installation
|
|
|
|
Install approved packages on-demand:
|
|
|
|
```python
|
|
result = await execute_code(
|
|
code="""
|
|
import seaborn as sns
|
|
import matplotlib.pyplot as plt
|
|
|
|
tips = sns.load_dataset('tips')
|
|
plt.figure(figsize=(10, 6))
|
|
sns.scatterplot(data=tips, x='total_bill', y='tip')
|
|
plt.savefig('plot.png')
|
|
print('Plot saved!')
|
|
""",
|
|
user_id=123,
|
|
install_packages=['seaborn', 'matplotlib']
|
|
)
|
|
```
|
|
|
|
### 5. Data Analysis
|
|
|
|
Automatic data loading and analysis:
|
|
|
|
```python
|
|
# The load_file() helper automatically detects file types
|
|
code = """
|
|
# Load CSV
|
|
df = load_file('file_id_here')
|
|
|
|
# Basic analysis
|
|
print(f"Shape: {df.shape}")
|
|
print(f"Columns: {df.columns.tolist()}")
|
|
print(df.describe())
|
|
|
|
# Correlation analysis
|
|
import seaborn as sns
|
|
import matplotlib.pyplot as plt
|
|
|
|
plt.figure(figsize=(12, 8))
|
|
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
|
|
plt.savefig('correlation.png')
|
|
"""
|
|
|
|
result = await execute_code(code=code, user_id=123, user_files=['file_id_here'])
|
|
|
|
# Visualizations are returned in result['generated_files']
|
|
for file in result.get('generated_files', []):
|
|
print(f"Generated: {file['filename']}")
|
|
# file['data'] contains the image bytes
|
|
```
|
|
|
|
## File Expiration
|
|
|
|
### Automatic Cleanup (48 Hours)
|
|
|
|
Files automatically expire after 48 hours:
|
|
|
|
```python
|
|
from src.utils.code_interpreter import cleanup_expired_files
|
|
|
|
# Run cleanup (should be scheduled periodically)
|
|
deleted_count = await cleanup_expired_files(db_handler=db)
|
|
print(f"Cleaned up {deleted_count} expired files")
|
|
```
|
|
|
|
### Manual File Deletion
|
|
|
|
Delete files manually:
|
|
|
|
```python
|
|
from src.utils.code_interpreter import delete_user_file
|
|
|
|
success = await delete_user_file(
|
|
file_id='user_123_1234567890_abc123',
|
|
user_id=123,
|
|
db_handler=db
|
|
)
|
|
```
|
|
|
|
## Security Features
|
|
|
|
### Approved Packages
|
|
|
|
Only approved packages can be installed:
|
|
|
|
- **Data Science**: numpy, pandas, scipy, scikit-learn, statsmodels
|
|
- **Visualization**: matplotlib, seaborn, plotly, bokeh, altair
|
|
- **Image Processing**: pillow, imageio, scikit-image
|
|
- **Machine Learning**: tensorflow, keras, torch, xgboost, lightgbm
|
|
- **NLP**: nltk, spacy, gensim, wordcloud
|
|
- **Math/Science**: sympy, networkx, numba
|
|
|
|
### Blocked Operations
|
|
|
|
Code is validated against dangerous operations:
|
|
|
|
- ❌ File system writes (outside execution dir)
|
|
- ❌ Network operations (socket, requests, urllib)
|
|
- ❌ Process spawning (subprocess)
|
|
- ❌ System access (os.system, eval, exec)
|
|
- ❌ Dangerous functions (__import__, globals, locals)
|
|
|
|
### Execution Limits
|
|
|
|
- **Timeout**: 60 seconds (configurable)
|
|
- **Output Size**: 100KB max (truncated if larger)
|
|
- **File Size**: 50MB max per file
|
|
|
|
## Environment Management
|
|
|
|
### Persistent Virtual Environment
|
|
|
|
The code interpreter uses a persistent venv:
|
|
|
|
- **Location**: `/tmp/bot_code_interpreter/venv`
|
|
- **Cleanup**: Automatically recreated every 7 days
|
|
- **Packages**: Cached and reused across executions
|
|
|
|
### Status Check
|
|
|
|
Get interpreter status:
|
|
|
|
```python
|
|
from src.utils.code_interpreter import get_interpreter_status
|
|
|
|
status = await get_interpreter_status(db_handler=db)
|
|
|
|
# Returns:
|
|
# {
|
|
# "venv_exists": True,
|
|
# "python_path": "/tmp/bot_code_interpreter/venv/bin/python",
|
|
# "installed_packages": ["numpy", "pandas", "matplotlib", ...],
|
|
# "package_count": 15,
|
|
# "last_cleanup": "2024-01-15T10:30:00",
|
|
# "total_user_files": 42,
|
|
# "total_file_size_mb": 125.5,
|
|
# "file_expiration_hours": 48,
|
|
# "max_file_size_mb": 50
|
|
# }
|
|
```
|
|
|
|
## Database Schema
|
|
|
|
### user_files Collection
|
|
|
|
```javascript
|
|
{
|
|
"file_id": "user_123_1234567890_abc123",
|
|
"user_id": 123456789,
|
|
"filename": "sales_data.csv",
|
|
"file_path": "/tmp/bot_code_interpreter/user_files/123456789/user_123_1234567890_abc123.csv",
|
|
"file_size": 1024000,
|
|
"file_type": "csv",
|
|
"uploaded_at": "2024-01-15T10:30:00",
|
|
"expires_at": "2024-01-17T10:30:00" // 48 hours later
|
|
}
|
|
```
|
|
|
|
### Indexes
|
|
|
|
Automatically created for performance:
|
|
|
|
```python
|
|
# Compound index for user queries
|
|
await db.user_files.create_index([("user_id", 1), ("expires_at", -1)])
|
|
|
|
# Unique index for file lookups
|
|
await db.user_files.create_index("file_id", unique=True)
|
|
|
|
# Index for cleanup queries
|
|
await db.user_files.create_index("expires_at")
|
|
```
|
|
|
|
## Integration Example
|
|
|
|
Complete example integrating code interpreter:
|
|
|
|
```python
|
|
from src.utils.code_interpreter import (
|
|
execute_code,
|
|
upload_file,
|
|
list_user_files,
|
|
cleanup_expired_files
|
|
)
|
|
|
|
async def handle_user_request(user_id: int, code: str, files: list, db):
|
|
"""Handle a code execution request from a user."""
|
|
|
|
# Upload any files the user provided
|
|
uploaded_files = []
|
|
for file_data, filename in files:
|
|
result = await upload_file(
|
|
user_id=user_id,
|
|
file_data=file_data,
|
|
filename=filename,
|
|
db_handler=db
|
|
)
|
|
if result['success']:
|
|
uploaded_files.append(result['file_id'])
|
|
|
|
# Execute the code with file access
|
|
result = await execute_code(
|
|
code=code,
|
|
user_id=user_id,
|
|
user_files=uploaded_files,
|
|
install_packages=['pandas', 'matplotlib'],
|
|
timeout=60,
|
|
db_handler=db
|
|
)
|
|
|
|
# Check for errors
|
|
if not result['success']:
|
|
return f"❌ Error: {result['error']}"
|
|
|
|
# Format output
|
|
response = f"✅ Execution completed in {result['execution_time']:.2f}s\n\n"
|
|
|
|
if result['output']:
|
|
response += f"**Output:**\n```\n{result['output']}\n```\n"
|
|
|
|
# Handle generated images
|
|
for file in result.get('generated_files', []):
|
|
if file['type'] == 'image':
|
|
response += f"\n📊 Generated: {file['filename']}\n"
|
|
# file['data'] contains image bytes - save or send to Discord
|
|
|
|
return response
|
|
|
|
# Periodic cleanup (run every hour)
|
|
async def scheduled_cleanup(db):
|
|
"""Clean up expired files."""
|
|
deleted = await cleanup_expired_files(db_handler=db)
|
|
if deleted > 0:
|
|
logging.info(f"Cleaned up {deleted} expired files")
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Common Errors
|
|
|
|
**Security Validation Failed**
|
|
```python
|
|
result = {
|
|
"success": False,
|
|
"error": "Security validation failed: Blocked unsafe operation: import\s+subprocess"
|
|
}
|
|
```
|
|
|
|
**Timeout**
|
|
```python
|
|
result = {
|
|
"success": False,
|
|
"error": "Execution timeout after 60 seconds",
|
|
"execution_time": 60,
|
|
"return_code": -1
|
|
}
|
|
```
|
|
|
|
**Package Not Approved**
|
|
```python
|
|
result = {
|
|
"success": False,
|
|
"error": "Package 'requests' is not in the approved list"
|
|
}
|
|
```
|
|
|
|
**File Too Large**
|
|
```python
|
|
result = {
|
|
"success": False,
|
|
"error": "File too large. Maximum size is 50MB"
|
|
}
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Always provide db_handler** for file management
|
|
2. **Set reasonable timeouts** for long-running code
|
|
3. **Handle generated_files** in results (images, etc.)
|
|
4. **Run cleanup_expired_files()** periodically (hourly recommended)
|
|
5. **Validate user input** before passing to execute_code()
|
|
6. **Check result['success']** before using output
|
|
7. **Display execution_time** to users for transparency
|
|
|
|
## Architecture
|
|
|
|
### Components
|
|
|
|
1. **FileManager**: Handles file upload/download, expiration, cleanup
|
|
2. **PackageManager**: Manages venv, installs packages, caches installations
|
|
3. **CodeExecutor**: Executes code securely, provides file access helpers
|
|
|
|
### Execution Flow
|
|
|
|
```
|
|
User Code Request
|
|
↓
|
|
Security Validation (blocked patterns)
|
|
↓
|
|
Ensure venv Ready (create if needed)
|
|
↓
|
|
Install Packages (if requested)
|
|
↓
|
|
Create Temp Execution Dir
|
|
↓
|
|
Inject File Access Helpers (load_file, FILES dict)
|
|
↓
|
|
Execute Code (isolated subprocess)
|
|
↓
|
|
Collect Output + Generated Files
|
|
↓
|
|
Cleanup Temp Dir
|
|
↓
|
|
Return Results
|
|
```
|
|
|
|
## Comparison to Old System
|
|
|
|
### Old System (3 separate files)
|
|
- `code_interpreter.py` - Router/dispatcher
|
|
- `python_executor.py` - Execution logic
|
|
- `data_analyzer.py` - Data analysis templates
|
|
|
|
### New System (1 unified file)
|
|
- ✅ All functionality in `code_interpreter.py`
|
|
- ✅ 48-hour file expiration (like images)
|
|
- ✅ Persistent venv with package caching
|
|
- ✅ Better security validation
|
|
- ✅ Automatic data loading helpers
|
|
- ✅ Unified API with async/await
|
|
- ✅ MongoDB integration for file tracking
|
|
- ✅ Automatic cleanup scheduling
|
|
|
|
## Troubleshooting
|
|
|
|
### Venv Creation Fails
|
|
|
|
Check disk space and permissions:
|
|
```bash
|
|
df -h /tmp
|
|
ls -la /tmp/bot_code_interpreter
|
|
```
|
|
|
|
### Packages Won't Install
|
|
|
|
Check if package is approved:
|
|
```python
|
|
from src.utils.code_interpreter import get_package_manager
|
|
|
|
pm = get_package_manager()
|
|
is_approved, reason = pm.is_package_approved('package_name')
|
|
print(f"Approved: {is_approved}, Reason: {reason}")
|
|
```
|
|
|
|
### Files Not Found
|
|
|
|
Check expiration:
|
|
```python
|
|
from src.utils.code_interpreter import get_file_manager
|
|
|
|
fm = get_file_manager(db_handler=db)
|
|
file_meta = await fm.get_file(file_id, user_id)
|
|
|
|
if not file_meta:
|
|
print("File expired or doesn't exist")
|
|
else:
|
|
print(f"Expires at: {file_meta['expires_at']}")
|
|
```
|
|
|
|
### Performance Issues
|
|
|
|
Check status and cleanup:
|
|
```python
|
|
status = await get_interpreter_status(db_handler=db)
|
|
print(f"Total files: {status['total_user_files']}")
|
|
print(f"Total size: {status['total_file_size_mb']} MB")
|
|
|
|
# Force cleanup
|
|
deleted = await cleanup_expired_files(db_handler=db)
|
|
print(f"Cleaned up: {deleted} files")
|
|
```
|
|
|
|
## Migration from Old System
|
|
|
|
If migrating from the old 3-file system:
|
|
|
|
1. **Replace imports**:
|
|
```python
|
|
# Old
|
|
from src.utils.python_executor import execute_python_code
|
|
from src.utils.data_analyzer import analyze_data_file
|
|
|
|
# New
|
|
from src.utils.code_interpreter import execute_code
|
|
```
|
|
|
|
2. **Update function calls**:
|
|
```python
|
|
# Old
|
|
result = await execute_python_code({
|
|
"code": code,
|
|
"user_id": user_id
|
|
})
|
|
|
|
# New
|
|
result = await execute_code(
|
|
code=code,
|
|
user_id=user_id,
|
|
db_handler=db
|
|
)
|
|
```
|
|
|
|
3. **Handle file uploads**:
|
|
```python
|
|
# New file handling
|
|
result = await upload_file(
|
|
user_id=user_id,
|
|
file_data=bytes,
|
|
filename=name,
|
|
db_handler=db
|
|
)
|
|
```
|
|
|
|
4. **Schedule cleanup**:
|
|
```python
|
|
# Add to bot startup
|
|
@tasks.loop(hours=1)
|
|
async def cleanup_task():
|
|
await cleanup_expired_files(db_handler=db)
|
|
```
|
|
|
|
## Summary
|
|
|
|
The unified code interpreter provides:
|
|
|
|
- 🔒 **Security**: Validated patterns, approved packages only
|
|
- ⏱️ **Expiration**: Automatic 48-hour file cleanup
|
|
- 📦 **Packages**: Persistent venv with caching
|
|
- 📊 **Analysis**: Built-in data loading helpers
|
|
- 🎨 **Visualizations**: Automatic image generation handling
|
|
- 🔄 **Integration**: Clean async API with MongoDB
|
|
- 📈 **Status**: Real-time monitoring and metrics
|
|
|
|
All in one file: `src/utils/code_interpreter.py`
|