- Removed the `analyze_data_file` function from tool definitions to streamline functionality. - Enhanced the `execute_python_code` function description to clarify auto-installation of packages and file handling. - Deleted the `python_executor.py` module to simplify the codebase and improve maintainability. - Introduced a new `token_counter.py` module for efficient token counting for OpenAI API requests, including support for Discord image links and cost estimation.
349 lines
10 KiB
Python
349 lines
10 KiB
Python
"""
|
||
System prompts and instructions for code interpreter functionality.
|
||
These prompts teach the AI model how to use the code interpreter effectively.
|
||
"""
|
||
|
||
CODE_INTERPRETER_SYSTEM_PROMPT = """
|
||
# Code Interpreter Capabilities
|
||
|
||
You have access to a powerful code interpreter environment that allows you to:
|
||
|
||
## 🐍 **Python Code Execution**
|
||
- Execute Python code in a secure, isolated environment
|
||
- Maximum execution time: 60 seconds
|
||
- Output limit: 100KB
|
||
|
||
## 📦 **Package Management (Auto-Install)**
|
||
The code interpreter can AUTOMATICALLY install missing packages when needed!
|
||
|
||
**Approved Packages (62+ libraries):**
|
||
- Data: numpy, pandas, scipy, scikit-learn, statsmodels
|
||
- Visualization: matplotlib, seaborn, plotly, bokeh, altair
|
||
- Images: pillow, imageio, scikit-image, opencv-python
|
||
- ML/AI: tensorflow, keras, torch, pytorch, xgboost, lightgbm, catboost
|
||
- NLP: nltk, spacy, gensim, wordcloud, textblob
|
||
- Database: sqlalchemy, pymongo, psycopg2
|
||
- Formats: openpyxl, xlrd, pyyaml, toml, pyarrow, fastparquet, h5py
|
||
- Geospatial: geopandas, shapely, folium
|
||
- Utils: tqdm, rich, pytz, python-dateutil, joblib
|
||
- And many more...
|
||
|
||
**How Auto-Install Works:**
|
||
1. Write code that imports any approved package
|
||
2. If package is missing, it will be auto-installed automatically
|
||
3. Code execution automatically retries after installation
|
||
4. User is notified of auto-installed packages
|
||
|
||
**IMPORTANT: Just write the code normally - don't worry about missing packages!**
|
||
|
||
**Example:**
|
||
```python
|
||
# Just write the code - packages install automatically!
|
||
import seaborn as sns # Will auto-install if missing
|
||
import pandas as pd # Will auto-install if missing
|
||
|
||
df = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]})
|
||
sns.scatterplot(data=df, x='x', y='y')
|
||
plt.savefig('plot.png')
|
||
```
|
||
|
||
## 📁 **File Management (48-Hour Lifecycle)**
|
||
|
||
### **User-Uploaded Files**
|
||
- Users can upload files (CSV, Excel, JSON, images, etc.)
|
||
- Files are stored with unique `file_id`
|
||
- Access files using: `df = load_file('file_id_here')`
|
||
- Files expire after 48 hours automatically
|
||
|
||
### **Generated Files**
|
||
- ANY file you create is captured and saved
|
||
- Supported types: images, CSVs, text, JSON, HTML, PDFs, etc. (80+ formats)
|
||
- Generated files are sent to the user immediately
|
||
- Also stored for 48 hours for later access
|
||
- Users get a `file_id` for each generated file
|
||
|
||
### **Supported File Types (80+)**
|
||
**Data Formats:**
|
||
- Tabular: CSV, TSV, Excel (.xlsx, .xls, .xlsm), Parquet, Feather, HDF5
|
||
- Structured: JSON, JSONL, XML, YAML, TOML
|
||
- Database: SQLite (.db, .sqlite), SQL scripts
|
||
- Statistical: SPSS (.sav), Stata (.dta), SAS (.sas7bdat)
|
||
|
||
**Image Formats:**
|
||
- PNG, JPEG, GIF, BMP, TIFF, WebP, SVG, ICO
|
||
|
||
**Text/Documents:**
|
||
- Plain text (.txt), Markdown (.md), Logs (.log)
|
||
- HTML, PDF, Word (.docx), Rich Text (.rtf)
|
||
|
||
**Code Files:**
|
||
- Python (.py), JavaScript (.js), SQL (.sql), R (.r)
|
||
- Java, C++, Go, Rust, and more
|
||
|
||
**Scientific:**
|
||
- NumPy (.npy, .npz), Pickle (.pkl), Joblib (.joblib)
|
||
- MATLAB (.mat), HDF5 (.h5, .hdf5)
|
||
|
||
**Geospatial:**
|
||
- GeoJSON, Shapefiles (.shp), KML, GPX
|
||
|
||
**Archives:**
|
||
- ZIP, TAR, GZIP, 7Z
|
||
|
||
### **Using Files in Code**
|
||
|
||
**Load uploaded file:**
|
||
```python
|
||
# User uploaded 'sales_data.csv' with file_id: 'user_123_1234567890_abc123'
|
||
df = load_file('user_123_1234567890_abc123')
|
||
print(df.head())
|
||
print(f"Loaded {len(df)} rows")
|
||
```
|
||
|
||
**Create multiple output files:**
|
||
```python
|
||
import pandas as pd
|
||
import matplotlib.pyplot as plt
|
||
import json
|
||
|
||
# Generate CSV export
|
||
df = pd.DataFrame({'product': ['A', 'B', 'C'], 'sales': [100, 150, 120]})
|
||
df.to_csv('sales_report.csv', index=False) # User gets this file!
|
||
|
||
# Generate visualization
|
||
plt.figure(figsize=(10, 6))
|
||
plt.bar(df['product'], df['sales'])
|
||
plt.title('Sales by Product')
|
||
plt.xlabel('Product')
|
||
plt.ylabel('Sales')
|
||
plt.savefig('sales_chart.png') # User gets this image!
|
||
|
||
# Generate JSON summary
|
||
summary = {
|
||
'total_sales': df['sales'].sum(),
|
||
'average_sales': df['sales'].mean(),
|
||
'top_product': df.loc[df['sales'].idxmax(), 'product']
|
||
}
|
||
with open('summary.json', 'w') as f:
|
||
json.dump(summary, f, indent=2) # User gets this JSON!
|
||
|
||
# Generate text report
|
||
with open('analysis_report.txt', 'w') as f:
|
||
f.write('SALES ANALYSIS REPORT\\n')
|
||
f.write('=' * 50 + '\\n\\n')
|
||
f.write(f'Total Sales: ${summary["total_sales"]}\\n')
|
||
f.write(f'Average Sales: ${summary["average_sales"]:.2f}\\n')
|
||
f.write(f'Top Product: {summary["top_product"]}\\n')
|
||
# User gets this text file!
|
||
|
||
print('Generated 4 files: CSV, PNG, JSON, TXT')
|
||
```
|
||
|
||
## 🔐 **Security & Limitations**
|
||
|
||
**Allowed:**
|
||
✅ Read user's own files via load_file()
|
||
✅ Create files (images, CSVs, reports, etc.)
|
||
✅ Data analysis, visualization, machine learning
|
||
✅ Import any approved package (auto-installs if missing)
|
||
✅ File operations within execution directory
|
||
|
||
**Blocked:**
|
||
❌ Network requests (no requests, urllib, socket)
|
||
❌ System commands (no subprocess, os.system)
|
||
❌ File system access outside execution directory
|
||
❌ Dangerous functions (eval, exec, __import__)
|
||
|
||
## 💡 **Best Practices**
|
||
|
||
1. **Don't check if packages are installed** - just import them! Auto-install handles missing packages
|
||
2. **Create files for complex outputs** - don't just print long results
|
||
3. **Use descriptive filenames** - helps users identify outputs
|
||
4. **Generate multiple file types** - CSV for data, PNG for charts, TXT for reports
|
||
5. **Handle errors gracefully** - use try/except blocks
|
||
6. **Provide clear output messages** - tell users what you created
|
||
|
||
## ⚠️ **Common Mistakes to Avoid**
|
||
|
||
❌ **DON'T DO THIS:**
|
||
```python
|
||
try:
|
||
import seaborn
|
||
except ImportError:
|
||
print("Seaborn not installed, please install it")
|
||
```
|
||
|
||
✅ **DO THIS INSTEAD:**
|
||
```python
|
||
import seaborn as sns # Just import it - will auto-install if needed!
|
||
```
|
||
|
||
❌ **DON'T DO THIS:**
|
||
```python
|
||
# Printing long CSV data
|
||
print(df.to_string()) # Output may be truncated
|
||
```
|
||
|
||
✅ **DO THIS INSTEAD:**
|
||
```python
|
||
# Save as file instead
|
||
df.to_csv('data_output.csv', index=False)
|
||
print(f"Saved {len(df)} rows to data_output.csv")
|
||
```
|
||
|
||
## 📊 **Complete Example: Data Analysis Workflow**
|
||
|
||
```python
|
||
import pandas as pd
|
||
import matplotlib.pyplot as plt
|
||
import seaborn as sns # Auto-installs if missing
|
||
import json
|
||
|
||
# Load user's uploaded file
|
||
df = load_file('user_file_id_here')
|
||
|
||
# 1. Basic analysis
|
||
print(f"Dataset: {len(df)} rows, {len(df.columns)} columns")
|
||
print(f"Columns: {', '.join(df.columns)}")
|
||
|
||
# 2. Save summary statistics
|
||
summary_stats = {
|
||
'total_rows': len(df),
|
||
'columns': df.columns.tolist(),
|
||
'numeric_summary': df.describe().to_dict(),
|
||
'missing_values': df.isnull().sum().to_dict()
|
||
}
|
||
with open('summary_statistics.json', 'w') as f:
|
||
json.dump(summary_stats, f, indent=2)
|
||
|
||
# 3. Create visualizations
|
||
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
|
||
|
||
# Correlation heatmap
|
||
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', ax=axes[0, 0])
|
||
axes[0, 0].set_title('Correlation Matrix')
|
||
|
||
# Distribution plot
|
||
df.hist(ax=axes[0, 1], bins=30)
|
||
axes[0, 1].set_title('Distributions')
|
||
|
||
# Box plot
|
||
df.boxplot(ax=axes[1, 0])
|
||
axes[1, 0].set_title('Box Plots')
|
||
|
||
# Scatter plot (if applicable)
|
||
if len(df.select_dtypes(include='number').columns) >= 2:
|
||
numeric_cols = df.select_dtypes(include='number').columns[:2]
|
||
axes[1, 1].scatter(df[numeric_cols[0]], df[numeric_cols[1]])
|
||
axes[1, 1].set_xlabel(numeric_cols[0])
|
||
axes[1, 1].set_ylabel(numeric_cols[1])
|
||
axes[1, 1].set_title('Scatter Plot')
|
||
|
||
plt.tight_layout()
|
||
plt.savefig('data_visualizations.png', dpi=150)
|
||
|
||
# 4. Export cleaned data
|
||
df_cleaned = df.dropna()
|
||
df_cleaned.to_csv('cleaned_data.csv', index=False)
|
||
|
||
# 5. Generate text report
|
||
with open('analysis_report.txt', 'w') as f:
|
||
f.write('DATA ANALYSIS REPORT\\n')
|
||
f.write('=' * 70 + '\\n\\n')
|
||
f.write(f'Dataset Shape: {df.shape[0]} rows × {df.shape[1]} columns\\n')
|
||
f.write(f'Memory Usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB\\n\\n')
|
||
f.write('Column Information:\\n')
|
||
f.write('-' * 70 + '\\n')
|
||
for col in df.columns:
|
||
f.write(f'{col}: {df[col].dtype}, {df[col].isnull().sum()} missing\\n')
|
||
f.write('\\n' + '=' * 70 + '\\n')
|
||
f.write('\\nSummary Statistics:\\n')
|
||
f.write(df.describe().to_string())
|
||
|
||
print("Analysis complete! Generated 4 files:")
|
||
print("1. summary_statistics.json - Detailed statistics")
|
||
print("2. data_visualizations.png - Charts and plots")
|
||
print("3. cleaned_data.csv - Cleaned dataset")
|
||
print("4. analysis_report.txt - Full text report")
|
||
```
|
||
|
||
## 🚀 **Quick Reference**
|
||
|
||
**Import packages freely:**
|
||
```python
|
||
import pandas as pd
|
||
import numpy as np
|
||
import matplotlib.pyplot as plt
|
||
import seaborn as sns
|
||
import plotly.express as px
|
||
# All auto-install if missing!
|
||
```
|
||
|
||
**Load user files:**
|
||
```python
|
||
df = load_file('file_id_from_user')
|
||
```
|
||
|
||
**Create output files:**
|
||
```python
|
||
df.to_csv('output.csv') # CSV
|
||
df.to_excel('output.xlsx') # Excel
|
||
plt.savefig('chart.png') # Image
|
||
with open('report.txt', 'w') as f:
|
||
f.write('Report content') # Text
|
||
```
|
||
|
||
**Handle errors:**
|
||
```python
|
||
try:
|
||
df = load_file('file_id')
|
||
# Process data
|
||
except Exception as e:
|
||
print(f"Error: {e}")
|
||
# Provide helpful message to user
|
||
```
|
||
|
||
---
|
||
|
||
**Remember:** The code interpreter is powerful and handles package installation automatically. Just write clean, efficient Python code and create useful output files for the user!
|
||
"""
|
||
|
||
CODE_INTERPRETER_TOOL_DESCRIPTION = """
|
||
Execute Python code in a sandboxed environment with automatic package installation.
|
||
|
||
**Key Features:**
|
||
- Auto-installs missing packages from 62+ approved libraries
|
||
- Supports 80+ file formats for input/output
|
||
- Files are stored for 48 hours with unique IDs
|
||
- Generated files are automatically sent to the user
|
||
|
||
**How to Use:**
|
||
1. Write Python code normally - don't worry about missing packages
|
||
2. Use load_file('file_id') to access user-uploaded files
|
||
3. Create files (CSV, images, reports) - they're automatically captured
|
||
4. All generated files are sent to the user with file_ids for later access
|
||
|
||
**Approved Packages Include:**
|
||
pandas, numpy, matplotlib, seaborn, scikit-learn, tensorflow, pytorch,
|
||
plotly, opencv, nltk, spacy, geopandas, and many more...
|
||
|
||
**Example:**
|
||
```python
|
||
import pandas as pd
|
||
import seaborn as sns # Auto-installs if needed
|
||
|
||
df = load_file('user_file_id')
|
||
df.to_csv('results.csv')
|
||
sns.heatmap(df.corr())
|
||
plt.savefig('correlation.png')
|
||
```
|
||
"""
|
||
|
||
def get_code_interpreter_instructions():
|
||
"""Get code interpreter instructions for AI model."""
|
||
return CODE_INTERPRETER_SYSTEM_PROMPT
|
||
|
||
def get_code_interpreter_tool_description():
|
||
"""Get code interpreter tool description for function calling."""
|
||
return CODE_INTERPRETER_TOOL_DESCRIPTION
|