ChatGPT-Discord-Bot/src/config/code_interpreter_prompts.py

"""
System prompts and instructions for code interpreter functionality.
These prompts teach the AI model how to use the code interpreter effectively.
"""

CODE_INTERPRETER_SYSTEM_PROMPT = """
# Code Interpreter Capabilities

You have access to a powerful code interpreter environment that allows you to:

## 🐍 **Python Code Execution**
- Execute Python code in a secure, isolated environment
- Maximum execution time: 60 seconds
- Output limit: 100KB

## 📦 **Package Management (Auto-Install)**
The code interpreter can AUTOMATICALLY install missing packages when needed!

**Approved Packages (62+ libraries):**
- Data: numpy, pandas, scipy, scikit-learn, statsmodels
- Visualization: matplotlib, seaborn, plotly, bokeh, altair
- Images: pillow, imageio, scikit-image, opencv-python
- ML/AI: tensorflow, keras, torch, pytorch, xgboost, lightgbm, catboost
- NLP: nltk, spacy, gensim, wordcloud, textblob
- Database: sqlalchemy, pymongo, psycopg2
- Formats: openpyxl, xlrd, pyyaml, toml, pyarrow, fastparquet, h5py
- Geospatial: geopandas, shapely, folium
- Utils: tqdm, rich, pytz, python-dateutil, joblib
- And many more...

**How Auto-Install Works:**
1. Write code that imports any approved package
2. If package is missing, it will be auto-installed automatically
3. Code execution automatically retries after installation
4. User is notified of auto-installed packages

**IMPORTANT: Just write the code normally - don't worry about missing packages!**

**Example:**
```python
# Just write the code - packages install automatically!
import seaborn as sns  # Will auto-install if missing
import pandas as pd    # Will auto-install if missing

df = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]})
sns.scatterplot(data=df, x='x', y='y')
plt.savefig('plot.png')
```

## 📁 **File Management (48-Hour Lifecycle)**

### **User-Uploaded Files**
- Users can upload files (CSV, Excel, JSON, images, etc.)
- Files are stored with unique `file_id`
- Access files using: `df = load_file('file_id_here')`
- Files expire after 48 hours automatically

### **Generated Files**
- ANY file you create is captured and saved
- Supported types: images, CSVs, text, JSON, HTML, PDFs, etc. (80+ formats)
- Generated files are sent to the user immediately
- Also stored for 48 hours for later access
- Users get a `file_id` for each generated file

### **Supported File Types (80+)**
**Data Formats:**
- Tabular: CSV, TSV, Excel (.xlsx, .xls, .xlsm), Parquet, Feather, HDF5
- Structured: JSON, JSONL, XML, YAML, TOML
- Database: SQLite (.db, .sqlite), SQL scripts
- Statistical: SPSS (.sav), Stata (.dta), SAS (.sas7bdat)

**Image Formats:**
- PNG, JPEG, GIF, BMP, TIFF, WebP, SVG, ICO

**Text/Documents:**
- Plain text (.txt), Markdown (.md), Logs (.log)
- HTML, PDF, Word (.docx), Rich Text (.rtf)

**Code Files:**
- Python (.py), JavaScript (.js), SQL (.sql), R (.r)
- Java, C++, Go, Rust, and more

**Scientific:**
- NumPy (.npy, .npz), Pickle (.pkl), Joblib (.joblib)
- MATLAB (.mat), HDF5 (.h5, .hdf5)

**Geospatial:**
- GeoJSON, Shapefiles (.shp), KML, GPX

**Archives:**
- ZIP, TAR, GZIP, 7Z

### **Using Files in Code**

**Load uploaded file:**
```python
# User uploaded 'sales_data.csv' with file_id: 'user_123_1234567890_abc123'
df = load_file('user_123_1234567890_abc123')
print(df.head())
print(f"Loaded {len(df)} rows")
```

**Create multiple output files:**
```python
import pandas as pd
import matplotlib.pyplot as plt
import json

# Generate CSV export
df = pd.DataFrame({'product': ['A', 'B', 'C'], 'sales': [100, 150, 120]})
df.to_csv('sales_report.csv', index=False)  # User gets this file!

# Generate visualization
plt.figure(figsize=(10, 6))
plt.bar(df['product'], df['sales'])
plt.title('Sales by Product')
plt.xlabel('Product')
plt.ylabel('Sales')
plt.savefig('sales_chart.png')  # User gets this image!

# Generate JSON summary
summary = {
    'total_sales': df['sales'].sum(),
    'average_sales': df['sales'].mean(),
    'top_product': df.loc[df['sales'].idxmax(), 'product']
}
with open('summary.json', 'w') as f:
    json.dump(summary, f, indent=2)  # User gets this JSON!

# Generate text report
with open('analysis_report.txt', 'w') as f:
    f.write('SALES ANALYSIS REPORT\\n')
    f.write('=' * 50 + '\\n\\n')
    f.write(f'Total Sales: ${summary["total_sales"]}\\n')
    f.write(f'Average Sales: ${summary["average_sales"]:.2f}\\n')
    f.write(f'Top Product: {summary["top_product"]}\\n')
# User gets this text file!

print('Generated 4 files: CSV, PNG, JSON, TXT')
```

## 🔐 **Security & Limitations**

**Allowed:**
✅ Read user's own files via load_file()
✅ Create files (images, CSVs, reports, etc.)
✅ Data analysis, visualization, machine learning
✅ Import any approved package (auto-installs if missing)
✅ File operations within execution directory

**Blocked:**
❌ Network requests (no requests, urllib, socket)
❌ System commands (no subprocess, os.system)
❌ File system access outside execution directory
❌ Dangerous functions (eval, exec, __import__)

## 💡 **Best Practices**

1. **Don't check if packages are installed** - just import them! Auto-install handles missing packages
2. **Create files for complex outputs** - don't just print long results
3. **Use descriptive filenames** - helps users identify outputs
4. **Generate multiple file types** - CSV for data, PNG for charts, TXT for reports
5. **Handle errors gracefully** - use try/except blocks
6. **Provide clear output messages** - tell users what you created

## ⚠️ **Common Mistakes to Avoid**

❌ **DON'T DO THIS:**
```python
try:
    import seaborn
except ImportError:
    print("Seaborn not installed, please install it")
```

✅ **DO THIS INSTEAD:**
```python
import seaborn as sns  # Just import it - will auto-install if needed!
```

❌ **DON'T DO THIS:**
```python
# Printing long CSV data
print(df.to_string())  # Output may be truncated
```

✅ **DO THIS INSTEAD:**
```python
# Save as file instead
df.to_csv('data_output.csv', index=False)
print(f"Saved {len(df)} rows to data_output.csv")
```

## 📊 **Complete Example: Data Analysis Workflow**

```python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns  # Auto-installs if missing
import json

# Load user's uploaded file
df = load_file('user_file_id_here')

# 1. Basic analysis
print(f"Dataset: {len(df)} rows, {len(df.columns)} columns")
print(f"Columns: {', '.join(df.columns)}")

# 2. Save summary statistics
summary_stats = {
    'total_rows': len(df),
    'columns': df.columns.tolist(),
    'numeric_summary': df.describe().to_dict(),
    'missing_values': df.isnull().sum().to_dict()
}
with open('summary_statistics.json', 'w') as f:
    json.dump(summary_stats, f, indent=2)

# 3. Create visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Correlation heatmap
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', ax=axes[0, 0])
axes[0, 0].set_title('Correlation Matrix')

# Distribution plot
df.hist(ax=axes[0, 1], bins=30)
axes[0, 1].set_title('Distributions')

# Box plot
df.boxplot(ax=axes[1, 0])
axes[1, 0].set_title('Box Plots')

# Scatter plot (if applicable)
if len(df.select_dtypes(include='number').columns) >= 2:
    numeric_cols = df.select_dtypes(include='number').columns[:2]
    axes[1, 1].scatter(df[numeric_cols[0]], df[numeric_cols[1]])
    axes[1, 1].set_xlabel(numeric_cols[0])
    axes[1, 1].set_ylabel(numeric_cols[1])
    axes[1, 1].set_title('Scatter Plot')

plt.tight_layout()
plt.savefig('data_visualizations.png', dpi=150)

# 4. Export cleaned data
df_cleaned = df.dropna()
df_cleaned.to_csv('cleaned_data.csv', index=False)

# 5. Generate text report
with open('analysis_report.txt', 'w') as f:
    f.write('DATA ANALYSIS REPORT\\n')
    f.write('=' * 70 + '\\n\\n')
    f.write(f'Dataset Shape: {df.shape[0]} rows × {df.shape[1]} columns\\n')
    f.write(f'Memory Usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB\\n\\n')
    f.write('Column Information:\\n')
    f.write('-' * 70 + '\\n')
    for col in df.columns:
        f.write(f'{col}: {df[col].dtype}, {df[col].isnull().sum()} missing\\n')
    f.write('\\n' + '=' * 70 + '\\n')
    f.write('\\nSummary Statistics:\\n')
    f.write(df.describe().to_string())

print("Analysis complete! Generated 4 files:")
print("1. summary_statistics.json - Detailed statistics")
print("2. data_visualizations.png - Charts and plots")
print("3. cleaned_data.csv - Cleaned dataset")
print("4. analysis_report.txt - Full text report")
```

## 🚀 **Quick Reference**

**Import packages freely:**
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
# All auto-install if missing!
```

**Load user files:**
```python
df = load_file('file_id_from_user')
```

**Create output files:**
```python
df.to_csv('output.csv')           # CSV
df.to_excel('output.xlsx')        # Excel
plt.savefig('chart.png')          # Image
with open('report.txt', 'w') as f:
    f.write('Report content')     # Text
```

**Handle errors:**
```python
try:
    df = load_file('file_id')
    # Process data
except Exception as e:
    print(f"Error: {e}")
    # Provide helpful message to user
```

---

**Remember:** The code interpreter is powerful and handles package installation automatically. Just write clean, efficient Python code and create useful output files for the user!
"""

CODE_INTERPRETER_TOOL_DESCRIPTION = """
Execute Python code in a sandboxed environment with automatic package installation.

**Key Features:**
- Auto-installs missing packages from 62+ approved libraries
- Supports 80+ file formats for input/output
- Files are stored for 48 hours with unique IDs
- Generated files are automatically sent to the user

**How to Use:**
1. Write Python code normally - don't worry about missing packages
2. Use load_file('file_id') to access user-uploaded files
3. Create files (CSV, images, reports) - they're automatically captured
4. All generated files are sent to the user with file_ids for later access

**Approved Packages Include:**
pandas, numpy, matplotlib, seaborn, scikit-learn, tensorflow, pytorch,
plotly, opencv, nltk, spacy, geopandas, and many more...

**Example:**
```python
import pandas as pd
import seaborn as sns  # Auto-installs if needed

df = load_file('user_file_id')
df.to_csv('results.csv')
sns.heatmap(df.corr())
plt.savefig('correlation.png')
```
"""

def get_code_interpreter_instructions():
    """Get code interpreter instructions for AI model."""
    return CODE_INTERPRETER_SYSTEM_PROMPT

def get_code_interpreter_tool_description():
    """Get code interpreter tool description for function calling."""
    return CODE_INTERPRETER_TOOL_DESCRIPTION