feat: Implement dynamic message length handling for Discord to prevent exceeding character limits
This commit is contained in:
152
docs/DISCORD_MESSAGE_LENGTH_FIX.md
Normal file
152
docs/DISCORD_MESSAGE_LENGTH_FIX.md
Normal file
@@ -0,0 +1,152 @@
|
||||
# Discord Message Length Fix
|
||||
|
||||
## Problem
|
||||
|
||||
Discord has a **2000 character limit** for messages. The bot was displaying code execution results without properly checking the total message length, causing this error:
|
||||
|
||||
```
|
||||
400 Bad Request (error code: 50035): Invalid Form Body
|
||||
In content: Must be 2000 or fewer in length.
|
||||
```
|
||||
|
||||
## Root Cause
|
||||
|
||||
The code was truncating individual parts (code, output, errors) but not checking the **combined total length** before sending. Even with truncated parts, the message could exceed 2000 characters when combined.
|
||||
|
||||
### Example of the Issue:
|
||||
|
||||
```python
|
||||
# Each part was truncated individually:
|
||||
execution_display += packages # 100 chars
|
||||
execution_display += input_data[:500] # 500 chars
|
||||
execution_display += code # 800 chars
|
||||
execution_display += output[:1000] # 1000 chars
|
||||
# Total: 2400 chars → EXCEEDS LIMIT! ❌
|
||||
```
|
||||
|
||||
## Solution
|
||||
|
||||
Implemented **dynamic length calculation** that:
|
||||
|
||||
1. **Calculates remaining space** before adding output/errors
|
||||
2. **Adjusts content length** based on what's already in the message
|
||||
3. **Final safety check** ensures total message < 2000 chars
|
||||
|
||||
### Changes Made
|
||||
|
||||
**File**: `src/module/message_handler.py`
|
||||
|
||||
#### Before:
|
||||
```python
|
||||
# Fixed truncation without considering total length
|
||||
execution_display += output[:1000] # ❌ Doesn't consider existing content
|
||||
```
|
||||
|
||||
#### After:
|
||||
```python
|
||||
# Dynamic truncation based on remaining space
|
||||
remaining = 1900 - len(execution_display) # ✅ Calculate available space
|
||||
if remaining > 100:
|
||||
execution_display += output[:remaining]
|
||||
if len(output) > remaining:
|
||||
execution_display += "\n... (output truncated)"
|
||||
else:
|
||||
execution_display += "(output too long)"
|
||||
|
||||
# Final safety check
|
||||
if len(execution_display) > 1990:
|
||||
execution_display = execution_display[:1980] + "\n...(truncated)"
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Two Display Scenarios:
|
||||
|
||||
#### 1. **Normal Display** (code < 3000 chars)
|
||||
```python
|
||||
execution_display = "🐍 Python Code Execution\n\n"
|
||||
+ packages (if any)
|
||||
+ input_data (max 500 chars)
|
||||
+ code (full, up to 3000 chars)
|
||||
+ output (remaining space, min 100 chars)
|
||||
+ final_check (ensure < 2000 total)
|
||||
```
|
||||
|
||||
#### 2. **File Attachment Display** (code >= 3000 chars)
|
||||
```python
|
||||
execution_display = "🐍 Python Code Execution\n\n"
|
||||
+ packages (if any)
|
||||
+ input_data (max 500 chars)
|
||||
+ "Code: *Attached as file*"
|
||||
+ output (remaining space, min 100 chars)
|
||||
+ final_check (ensure < 2000 total)
|
||||
# Code sent as separate .py file attachment
|
||||
```
|
||||
|
||||
### Smart Truncation Strategy:
|
||||
|
||||
1. **Priority Order** (most to least important):
|
||||
- Header & metadata (packages, input info)
|
||||
- Code (inline or file attachment)
|
||||
- Output/Errors (dynamically sized)
|
||||
|
||||
2. **Space Allocation**:
|
||||
- Reserve 1900 chars (100 char buffer)
|
||||
- Calculate: `remaining = 1900 - len(current_content)`
|
||||
- Only add output/errors if `remaining > 100`
|
||||
|
||||
3. **Safety Net**:
|
||||
- Final check: `if len(message) > 1990`
|
||||
- Hard truncate at 1980 with "...(truncated)"
|
||||
|
||||
## Benefits
|
||||
|
||||
✅ **No More Discord Errors**: Messages never exceed 2000 char limit
|
||||
✅ **Smart Truncation**: Prioritizes most important information
|
||||
✅ **Better UX**: Users see as much as possible within limits
|
||||
✅ **Graceful Degradation**: Long content becomes file attachments
|
||||
✅ **Clear Indicators**: Shows when content is truncated
|
||||
|
||||
## Testing
|
||||
|
||||
To test the fix:
|
||||
|
||||
1. **Short code + long output**: Should display inline with truncated output
|
||||
2. **Long code + short output**: Code as file, output inline
|
||||
3. **Long code + long output**: Code as file, output truncated
|
||||
4. **Very long error messages**: Should truncate gracefully
|
||||
|
||||
Example test case:
|
||||
```python
|
||||
# Generate long output
|
||||
for i in range(1000):
|
||||
print(f"Line {i}: " + "x" * 100)
|
||||
```
|
||||
|
||||
Before: ❌ Discord 400 error
|
||||
After: ✅ Displays with "(output truncated)" indicator
|
||||
|
||||
## Related Files
|
||||
|
||||
- `src/module/message_handler.py` (Lines 400-480)
|
||||
- Fixed both normal display and file attachment display
|
||||
- Added dynamic length calculation
|
||||
- Added final safety check
|
||||
|
||||
## Prevention
|
||||
|
||||
To prevent similar issues in the future:
|
||||
|
||||
1. **Always calculate remaining space** before adding variable-length content
|
||||
2. **Use final safety check** before sending to Discord
|
||||
3. **Test with extreme cases** (very long code, output, errors)
|
||||
4. **Consider file attachments** for content that might exceed limits
|
||||
|
||||
## Discord Limits Reference
|
||||
|
||||
- **Message content**: 2000 characters max
|
||||
- **Embed description**: 4096 characters max
|
||||
- **Embed field value**: 1024 characters max
|
||||
- **Code blocks**: Count toward message limit
|
||||
|
||||
**Note**: We use 1990 as safe limit (10 char buffer) to account for markdown formatting and edge cases.
|
||||
132
docs/FILE_ACCESS_FIX.md
Normal file
132
docs/FILE_ACCESS_FIX.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# File Access Fix - Database Type Mismatch
|
||||
|
||||
## Problem
|
||||
|
||||
Users were uploading files successfully, but when the AI tried to execute code using `load_file()`, it would get the error:
|
||||
|
||||
```
|
||||
ValueError: File 'xxx' not found or not accessible.
|
||||
No files are currently accessible. Make sure to upload a file first.
|
||||
```
|
||||
|
||||
## Root Cause
|
||||
|
||||
**Data Type Mismatch in Database Query**
|
||||
|
||||
The issue was in `src/database/db_handler.py` in the `get_user_files()` method:
|
||||
|
||||
### What Was Happening:
|
||||
|
||||
1. **File Upload** (`code_interpreter.py`):
|
||||
```python
|
||||
expires_at = (datetime.now() + timedelta(hours=48)).isoformat()
|
||||
# Result: "2025-10-04T22:26:25.044108" (ISO string)
|
||||
```
|
||||
|
||||
2. **Database Query** (`db_handler.py`):
|
||||
```python
|
||||
current_time = datetime.now() # datetime object
|
||||
files = await self.db.user_files.find({
|
||||
"user_id": user_id,
|
||||
"$or": [
|
||||
{"expires_at": {"$gt": current_time}}, # Comparing string > datetime ❌
|
||||
{"expires_at": None}
|
||||
]
|
||||
}).to_list(length=1000)
|
||||
```
|
||||
|
||||
3. **Result**: MongoDB couldn't compare ISO string with datetime object, so the query returned 0 files.
|
||||
|
||||
### Logs Showing the Issue:
|
||||
|
||||
```
|
||||
2025-10-02 22:26:25,106 - [DEBUG] Saved file metadata to database: 878573881449906208_1759418785_112e8587
|
||||
2025-10-02 22:26:34,964 - [DEBUG] Fetched 0 files from DB for user 878573881449906208 ❌
|
||||
2025-10-02 22:26:34,964 - [DEBUG] No files found in database for user 878573881449906208 ❌
|
||||
```
|
||||
|
||||
## Solution
|
||||
|
||||
**Changed database query to use ISO string format for time comparison:**
|
||||
|
||||
```python
|
||||
# Before:
|
||||
current_time = datetime.now() # datetime object
|
||||
|
||||
# After:
|
||||
current_time = datetime.now().isoformat() # ISO string
|
||||
```
|
||||
|
||||
This ensures both values are ISO strings, making the MongoDB comparison work correctly.
|
||||
|
||||
## Files Modified
|
||||
|
||||
1. **`src/database/db_handler.py`** (Line 344)
|
||||
- Changed `current_time = datetime.now()` to `current_time = datetime.now().isoformat()`
|
||||
- Added debug logging to show query results
|
||||
|
||||
2. **`src/module/message_handler.py`** (Lines 327-339)
|
||||
- Added comprehensive debug logging to trace file fetching
|
||||
|
||||
3. **`src/utils/code_interpreter.py`** (Lines 153-160)
|
||||
- Changed `insert_one` to `update_one` with `upsert=True` to avoid duplicate key errors
|
||||
- Added debug logging for database saves
|
||||
|
||||
4. **`src/module/message_handler.py`** (Lines 637-680, 716-720)
|
||||
- Updated data analysis feature to use `load_file()` with file IDs
|
||||
- Added `user_files` parameter to `execute_code()` call
|
||||
|
||||
## Testing
|
||||
|
||||
After the fix, the flow should work correctly:
|
||||
|
||||
1. **Upload File**:
|
||||
```
|
||||
✅ Saved file metadata to database: 878573881449906208_1759418785_112e8587
|
||||
```
|
||||
|
||||
2. **Fetch Files**:
|
||||
```
|
||||
✅ [DEBUG] Query returned 1 files for user 878573881449906208
|
||||
✅ Code execution will have access to 1 file(s) for user 878573881449906208
|
||||
```
|
||||
|
||||
3. **Execute Code**:
|
||||
```
|
||||
✅ Processing 1 file(s) for code execution
|
||||
✅ Added file to execution context: 878573881449906208_1759418785_112e8587 -> /path/to/file
|
||||
✅ Total files accessible in execution: 1
|
||||
```
|
||||
|
||||
4. **Load File in Code**:
|
||||
```python
|
||||
df = pd.read_excel(load_file('878573881449906208_1759418785_112e8587'))
|
||||
# ✅ Works!
|
||||
```
|
||||
|
||||
## Restart Required
|
||||
|
||||
**Yes, restart the bot** to apply the changes:
|
||||
|
||||
```bash
|
||||
# Stop the bot (Ctrl+C)
|
||||
# Then restart:
|
||||
python3 bot.py
|
||||
```
|
||||
|
||||
## Prevention
|
||||
|
||||
To prevent similar issues in the future:
|
||||
|
||||
1. **Consistent date handling**: Always use the same format (ISO strings or datetime objects) throughout the codebase
|
||||
2. **Add debug logging**: Log database queries and results to catch data type mismatches
|
||||
3. **Test file access**: After any database schema changes, test the full file upload → execution flow
|
||||
|
||||
## Related Issues
|
||||
|
||||
- File upload was working ✅
|
||||
- Database saving was working ✅
|
||||
- Database query was failing due to type mismatch ❌
|
||||
- Code execution couldn't find files ❌
|
||||
|
||||
All issues now resolved! ✅
|
||||
@@ -125,13 +125,33 @@ Tools:
|
||||
|
||||
✅ Approved: pandas, numpy, matplotlib, seaborn, scikit-learn, tensorflow, pytorch, plotly, opencv, scipy, statsmodels, pillow, openpyxl, geopandas, folium, xgboost, lightgbm, bokeh, altair, and 80+ more.
|
||||
|
||||
📂 File Access: User files are AUTOMATICALLY available via load_file('file_id'). The system tells you when files are uploaded with their file_id. Just use load_file() - it auto-detects file type (CSV→DataFrame, Excel→DataFrame, JSON→dict, etc.)
|
||||
📂 File Access: When users upload files, you'll receive the file_id in the conversation context (e.g., "File ID: abc123_xyz"). Use load_file('file_id') to access them. The function auto-detects file types:
|
||||
- CSV/TSV → pandas DataFrame
|
||||
- Excel (.xlsx, .xls) → pandas ExcelFile object (use .sheet_names and .parse('Sheet1'))
|
||||
- JSON → dict or DataFrame
|
||||
- Images → PIL Image object
|
||||
- Text → string content
|
||||
- And 200+ more formats...
|
||||
|
||||
📊 Excel Files: load_file() returns ExcelFile object for multi-sheet support:
|
||||
excel_file = load_file('file_id')
|
||||
sheets = excel_file.sheet_names # Get all sheet names
|
||||
df = excel_file.parse('Sheet1') # Read specific sheet
|
||||
# Or: df = pd.read_excel(excel_file, sheet_name='Sheet1')
|
||||
# Check if sheet has data: if not df.empty and len(df.columns) > 0
|
||||
|
||||
⚠️ IMPORTANT:
|
||||
- If load_file() fails, error lists available file IDs - use the correct one
|
||||
- Always check if DataFrames are empty before operations like .describe()
|
||||
- Excel files may have empty sheets - skip or handle them gracefully
|
||||
|
||||
💾 Output Files: ALL generated files (CSV, images, JSON, text, plots, etc.) are AUTO-CAPTURED and sent to user. Files stored for 48h (configurable). Just create files - they're automatically shared!
|
||||
|
||||
✅ DO:
|
||||
- Import packages directly (auto-installs!)
|
||||
- Use load_file('file_id') for user uploads
|
||||
- Use load_file('file_id') with the EXACT file_id from context
|
||||
- Check if DataFrames are empty: if not df.empty and len(df.columns) > 0
|
||||
- Handle errors gracefully (empty sheets, missing data, etc.)
|
||||
- Create output files with descriptive names
|
||||
- Generate visualizations (plt.savefig, etc.)
|
||||
- Return multiple files (data + plots + reports)
|
||||
@@ -141,6 +161,7 @@ Tools:
|
||||
- Use install_packages parameter
|
||||
- Print large datasets (create CSV instead)
|
||||
- Manually handle file paths
|
||||
- Guess file_ids - use the exact ID from the upload message
|
||||
|
||||
Example:
|
||||
```python
|
||||
@@ -148,16 +169,26 @@ import pandas as pd
|
||||
import seaborn as sns # Auto-installs!
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Load user's file (file_id provided in context)
|
||||
df = load_file('abc123') # Auto-detects CSV/Excel/JSON/etc
|
||||
# Load user's file (file_id from upload message: "File ID: 123456_abc")
|
||||
data = load_file('123456_abc') # Auto-detects type
|
||||
|
||||
# Process and analyze
|
||||
summary = df.describe()
|
||||
summary.to_csv('summary_stats.csv')
|
||||
# For Excel files:
|
||||
if hasattr(data, 'sheet_names'): # It's an ExcelFile
|
||||
for sheet in data.sheet_names:
|
||||
df = data.parse(sheet)
|
||||
if not df.empty and len(df.columns) > 0:
|
||||
# Process non-empty sheets
|
||||
summary = df.describe()
|
||||
summary.to_csv(f'{sheet}_summary.csv')
|
||||
else: # It's already a DataFrame (CSV, etc.)
|
||||
df = data
|
||||
summary = df.describe()
|
||||
summary.to_csv('summary_stats.csv')
|
||||
|
||||
# Create visualization
|
||||
sns.heatmap(df.corr(), annot=True)
|
||||
plt.savefig('correlation_plot.png')
|
||||
if not df.empty:
|
||||
sns.heatmap(df.corr(), annot=True)
|
||||
plt.savefig('correlation_plot.png')
|
||||
|
||||
# Everything is automatically sent to user!
|
||||
```
|
||||
|
||||
@@ -341,7 +341,7 @@ class DatabaseHandler:
|
||||
async def get_user_files(self, user_id: int) -> List[Dict[str, Any]]:
|
||||
"""Get all files for a specific user"""
|
||||
try:
|
||||
current_time = datetime.now()
|
||||
current_time = datetime.now().isoformat() # Use ISO string for comparison
|
||||
files = await self.db.user_files.find({
|
||||
"user_id": user_id,
|
||||
"$or": [
|
||||
@@ -349,6 +349,7 @@ class DatabaseHandler:
|
||||
{"expires_at": None} # Never expires
|
||||
]
|
||||
}).to_list(length=1000)
|
||||
logging.info(f"[DEBUG] Query returned {len(files)} files for user {user_id}")
|
||||
return files
|
||||
except Exception as e:
|
||||
logging.error(f"Error getting user files: {e}")
|
||||
|
||||
@@ -328,9 +328,15 @@ class MessageHandler:
|
||||
if user_id:
|
||||
try:
|
||||
db_files = await self.db.get_user_files(user_id)
|
||||
logging.info(f"[DEBUG] Fetched {len(db_files) if db_files else 0} files from DB for user {user_id}")
|
||||
if db_files:
|
||||
for f in db_files:
|
||||
logging.info(f"[DEBUG] DB file: {f.get('file_id', 'NO_ID')} - {f.get('filename', 'NO_NAME')}")
|
||||
user_files = [f['file_id'] for f in db_files if 'file_id' in f]
|
||||
if user_files:
|
||||
logging.info(f"Code execution will have access to {len(user_files)} file(s) for user {user_id}")
|
||||
logging.info(f"Code execution will have access to {len(user_files)} file(s) for user {user_id}: {user_files}")
|
||||
else:
|
||||
logging.warning(f"[DEBUG] No files found in database for user {user_id}")
|
||||
except Exception as e:
|
||||
logging.warning(f"Could not fetch user files: {e}")
|
||||
|
||||
@@ -405,17 +411,31 @@ class MessageHandler:
|
||||
|
||||
if output and output.strip():
|
||||
execution_display += "**📤 Output:**\n```\n"
|
||||
execution_display += output[:2000] # More space for output when code is attached
|
||||
if len(output) > 2000:
|
||||
execution_display += "\n... (output truncated)"
|
||||
# Calculate remaining space (2000 - current length - markdown)
|
||||
remaining = 1900 - len(execution_display)
|
||||
if remaining > 100:
|
||||
execution_display += output[:remaining]
|
||||
if len(output) > remaining:
|
||||
execution_display += "\n... (output truncated)"
|
||||
else:
|
||||
execution_display += "(output too long)"
|
||||
execution_display += "\n```"
|
||||
else:
|
||||
execution_display += "**📤 Output:** *(No output)*"
|
||||
else:
|
||||
error_msg = execute_result.get("error", "Unknown error") if execute_result else "Execution failed"
|
||||
execution_display += f"**❌ Error:**\n```\n{error_msg[:1000]}\n```"
|
||||
if len(error_msg) > 1000:
|
||||
execution_display += "*(Error message truncated)*"
|
||||
# Calculate remaining space
|
||||
remaining = 1900 - len(execution_display)
|
||||
if remaining > 100:
|
||||
execution_display += f"**❌ Error:**\n```\n{error_msg[:remaining]}\n```"
|
||||
if len(error_msg) > remaining:
|
||||
execution_display += "*(Error message truncated)*"
|
||||
else:
|
||||
execution_display += "**❌ Error:** *(Error too long - see logs)*"
|
||||
|
||||
# Final safety check: ensure total length < 2000
|
||||
if len(execution_display) > 1990:
|
||||
execution_display = execution_display[:1980] + "\n...(truncated)"
|
||||
|
||||
# Send with file attachment
|
||||
await discord_message.channel.send(execution_display, file=code_file)
|
||||
@@ -450,17 +470,31 @@ class MessageHandler:
|
||||
|
||||
if output and output.strip():
|
||||
execution_display += "**📤 Output:**\n```\n"
|
||||
execution_display += output[:1000] # Limit output length for Discord
|
||||
if len(output) > 1000:
|
||||
execution_display += "\n... (output truncated)"
|
||||
# Calculate remaining space (2000 - current length - markdown)
|
||||
remaining = 1900 - len(execution_display)
|
||||
if remaining > 100:
|
||||
execution_display += output[:remaining]
|
||||
if len(output) > remaining:
|
||||
execution_display += "\n... (output truncated)"
|
||||
else:
|
||||
execution_display += "(output too long)"
|
||||
execution_display += "\n```"
|
||||
else:
|
||||
execution_display += "**📤 Output:** *(No output)*"
|
||||
else:
|
||||
error_msg = execute_result.get("error", "Unknown error") if execute_result else "Execution failed"
|
||||
execution_display += f"**❌ Error:**\n```\n{error_msg[:800]}\n```"
|
||||
if len(error_msg) > 800:
|
||||
execution_display += "*(Error message truncated)*"
|
||||
# Calculate remaining space
|
||||
remaining = 1900 - len(execution_display)
|
||||
if remaining > 100:
|
||||
execution_display += f"**❌ Error:**\n```\n{error_msg[:remaining]}\n```"
|
||||
if len(error_msg) > remaining:
|
||||
execution_display += "*(Error message truncated)*"
|
||||
else:
|
||||
execution_display += "**❌ Error:** *(Error too long - see logs)*"
|
||||
|
||||
# Final safety check: ensure total length < 2000
|
||||
if len(execution_display) > 1990:
|
||||
execution_display = execution_display[:1980] + "\n...(truncated)"
|
||||
|
||||
# Send the execution display to Discord as a separate message
|
||||
await discord_message.channel.send(execution_display)
|
||||
@@ -636,24 +670,41 @@ class MessageHandler:
|
||||
)
|
||||
|
||||
if upload_result['success']:
|
||||
# Use the new file path
|
||||
# Get file_id for new load_file() system
|
||||
file_id = upload_result['file_id']
|
||||
file_path = upload_result['file_path']
|
||||
logging.info(f"Migrated file to code interpreter: {file_path}")
|
||||
logging.info(f"Migrated file to code interpreter: {file_path} (ID: {file_id})")
|
||||
except Exception as e:
|
||||
logging.warning(f"Could not migrate file to code interpreter: {e}")
|
||||
file_id = None
|
||||
else:
|
||||
# File is already in new system, get file_id from args
|
||||
file_id = args.get("file_id")
|
||||
|
||||
# Generate analysis code based on the request
|
||||
# Detect file type
|
||||
file_ext = os.path.splitext(file_path)[1].lower()
|
||||
|
||||
if file_ext in ['.xlsx', '.xls']:
|
||||
load_statement = f"df = pd.read_excel('{file_path}')"
|
||||
elif file_ext == '.json':
|
||||
load_statement = f"df = pd.read_json('{file_path}')"
|
||||
elif file_ext == '.parquet':
|
||||
load_statement = f"df = pd.read_parquet('{file_path}')"
|
||||
else: # Default to CSV
|
||||
load_statement = f"df = pd.read_csv('{file_path}')"
|
||||
# Use load_file() if we have a file_id, otherwise use direct path
|
||||
if file_id:
|
||||
if file_ext in ['.xlsx', '.xls']:
|
||||
load_statement = f"df = pd.read_excel(load_file('{file_id}'))"
|
||||
elif file_ext == '.json':
|
||||
load_statement = f"df = pd.read_json(load_file('{file_id}'))"
|
||||
elif file_ext == '.parquet':
|
||||
load_statement = f"df = pd.read_parquet(load_file('{file_id}'))"
|
||||
else: # Default to CSV
|
||||
load_statement = f"df = pd.read_csv(load_file('{file_id}'))"
|
||||
else:
|
||||
# Fallback to direct path for legacy support
|
||||
if file_ext in ['.xlsx', '.xls']:
|
||||
load_statement = f"df = pd.read_excel('{file_path}')"
|
||||
elif file_ext == '.json':
|
||||
load_statement = f"df = pd.read_json('{file_path}')"
|
||||
elif file_ext == '.parquet':
|
||||
load_statement = f"df = pd.read_parquet('{file_path}')"
|
||||
else: # Default to CSV
|
||||
load_statement = f"df = pd.read_csv('{file_path}')"
|
||||
|
||||
analysis_code = f"""
|
||||
import pandas as pd
|
||||
@@ -695,9 +746,13 @@ print("\\n=== Correlation Analysis ===")
|
||||
"""
|
||||
|
||||
# Execute the analysis code
|
||||
# Pass file_id as user_files if available
|
||||
user_files_for_analysis = [file_id] if file_id else []
|
||||
|
||||
result = await execute_code(
|
||||
code=analysis_code,
|
||||
user_id=user_id,
|
||||
user_files=user_files_for_analysis,
|
||||
db_handler=self.db
|
||||
)
|
||||
|
||||
|
||||
@@ -71,9 +71,10 @@ APPROVED_PACKAGES = {
|
||||
}
|
||||
|
||||
# Blocked patterns
|
||||
# Note: We allow open() for writing to enable saving plots and outputs
|
||||
# The sandboxed environment restricts file access to safe directories
|
||||
BLOCKED_PATTERNS = [
|
||||
r'\bopen\s*\([^)]*[\'"]w',
|
||||
r'\bopen\s*\([^)]*[\'"]a',
|
||||
# Dangerous system modules
|
||||
r'import\s+os\b(?!\s*\.path)',
|
||||
r'from\s+os\s+import\s+(?!path)',
|
||||
r'import\s+shutil\b',
|
||||
@@ -90,12 +91,14 @@ BLOCKED_PATTERNS = [
|
||||
r'from\s+requests\s+import',
|
||||
r'import\s+aiohttp\b',
|
||||
r'from\s+aiohttp\s+import',
|
||||
# Dangerous code execution
|
||||
r'__import__\s*\(',
|
||||
r'\beval\s*\(',
|
||||
r'\bexec\s*\(',
|
||||
r'\bcompile\s*\(',
|
||||
r'\bglobals\s*\(',
|
||||
r'\blocals\s*\(',
|
||||
# File system operations (dangerous)
|
||||
r'\.unlink\s*\(',
|
||||
r'\.rmdir\s*\(',
|
||||
r'\.remove\s*\(',
|
||||
@@ -151,7 +154,13 @@ class FileManager:
|
||||
}
|
||||
|
||||
if self.db:
|
||||
await self.db.db.user_files.insert_one(metadata)
|
||||
# Use update_one with upsert to avoid duplicate key errors
|
||||
await self.db.db.user_files.update_one(
|
||||
{"file_id": file_id},
|
||||
{"$set": metadata},
|
||||
upsert=True
|
||||
)
|
||||
logger.info(f"[DEBUG] Saved file metadata to database: {file_id}")
|
||||
|
||||
expiration_msg = "never expires" if FILE_EXPIRATION_HOURS == -1 else f"expires in {FILE_EXPIRATION_HOURS}h"
|
||||
logger.info(f"Saved file {filename} for user {user_id}: {file_id} ({expiration_msg})")
|
||||
@@ -732,10 +741,21 @@ class CodeExecutor:
|
||||
try:
|
||||
file_paths_map = {}
|
||||
if user_files:
|
||||
logger.info(f"Processing {len(user_files)} file(s) for code execution")
|
||||
for file_id in user_files:
|
||||
file_meta = await self.file_manager.get_file(file_id, user_id)
|
||||
if file_meta:
|
||||
file_paths_map[file_id] = file_meta['file_path']
|
||||
logger.info(f"Added file to execution context: {file_id} -> {file_meta['file_path']}")
|
||||
else:
|
||||
logger.warning(f"File {file_id} not found or expired for user {user_id}")
|
||||
|
||||
if file_paths_map:
|
||||
logger.info(f"Total files accessible in execution: {len(file_paths_map)}")
|
||||
else:
|
||||
logger.warning(f"No files found for user {user_id} despite {len(user_files)} file_ids provided")
|
||||
else:
|
||||
logger.debug("No user files provided for code execution")
|
||||
|
||||
env_setup = f"""
|
||||
import sys
|
||||
@@ -747,9 +767,36 @@ def load_file(file_id):
|
||||
'''
|
||||
Load a file automatically based on its extension.
|
||||
Supports 200+ file types with smart auto-detection.
|
||||
|
||||
Args:
|
||||
file_id: The file ID provided when the file was uploaded
|
||||
|
||||
Returns:
|
||||
Loaded file data (varies by file type):
|
||||
- CSV/TSV: pandas DataFrame
|
||||
- Excel (.xlsx, .xls): pandas ExcelFile object
|
||||
- JSON: pandas DataFrame or dict
|
||||
- Parquet/Feather: pandas DataFrame
|
||||
- Text files: string content
|
||||
- Images: PIL Image object
|
||||
- And 200+ more formats...
|
||||
|
||||
Excel file usage examples:
|
||||
excel_file = load_file('file_id')
|
||||
sheet_names = excel_file.sheet_names
|
||||
df = excel_file.parse('Sheet1')
|
||||
df2 = pd.read_excel(excel_file, sheet_name='Sheet1')
|
||||
|
||||
Available files: {{', '.join(FILES.keys()) if FILES else 'None'}}
|
||||
'''
|
||||
if file_id not in FILES:
|
||||
raise ValueError(f"File {{file_id}} not found or not accessible")
|
||||
available_files = list(FILES.keys())
|
||||
error_msg = f"File '{{file_id}}' not found or not accessible.\\n"
|
||||
if available_files:
|
||||
error_msg += f"Available file IDs: {{', '.join(available_files)}}"
|
||||
else:
|
||||
error_msg += "No files are currently accessible. Make sure to upload a file first."
|
||||
raise ValueError(error_msg)
|
||||
file_path = FILES[file_id]
|
||||
|
||||
# Import common libraries (they'll auto-install if needed)
|
||||
@@ -763,9 +810,12 @@ def load_file(file_id):
|
||||
if ext == 'csv':
|
||||
return pd.read_csv(file_path)
|
||||
elif ext in ['xlsx', 'xls', 'xlsm', 'xlsb']:
|
||||
return pd.read_excel(file_path)
|
||||
# Return ExcelFile object for multi-sheet access
|
||||
# Users can: excel_file.sheet_names, excel_file.parse('Sheet1'), or pd.read_excel(excel_file, sheet_name='Sheet1')
|
||||
return pd.ExcelFile(file_path)
|
||||
elif ext == 'ods':
|
||||
return pd.read_excel(file_path, engine='odf')
|
||||
# Return ExcelFile object for ODS multi-sheet access
|
||||
return pd.ExcelFile(file_path, engine='odf')
|
||||
elif ext == 'tsv' or ext == 'tab':
|
||||
return pd.read_csv(file_path, sep='\\t')
|
||||
|
||||
|
||||
Reference in New Issue
Block a user