feat: Implement dynamic message length handling for Discord to prevent exceeding character limits

This commit is contained in:
2025-10-02 23:09:10 +07:00
parent 9c180bdd89
commit 42274e6ad5
6 changed files with 460 additions and 39 deletions

View File

@@ -0,0 +1,152 @@
# Discord Message Length Fix
## Problem
Discord has a **2000 character limit** for messages. The bot was displaying code execution results without properly checking the total message length, causing this error:
```
400 Bad Request (error code: 50035): Invalid Form Body
In content: Must be 2000 or fewer in length.
```
## Root Cause
The code was truncating individual parts (code, output, errors) but not checking the **combined total length** before sending. Even with truncated parts, the message could exceed 2000 characters when combined.
### Example of the Issue:
```python
# Each part was truncated individually:
execution_display += packages # 100 chars
execution_display += input_data[:500] # 500 chars
execution_display += code # 800 chars
execution_display += output[:1000] # 1000 chars
# Total: 2400 chars → EXCEEDS LIMIT! ❌
```
## Solution
Implemented **dynamic length calculation** that:
1. **Calculates remaining space** before adding output/errors
2. **Adjusts content length** based on what's already in the message
3. **Final safety check** ensures total message < 2000 chars
### Changes Made
**File**: `src/module/message_handler.py`
#### Before:
```python
# Fixed truncation without considering total length
execution_display += output[:1000] # ❌ Doesn't consider existing content
```
#### After:
```python
# Dynamic truncation based on remaining space
remaining = 1900 - len(execution_display) # ✅ Calculate available space
if remaining > 100:
execution_display += output[:remaining]
if len(output) > remaining:
execution_display += "\n... (output truncated)"
else:
execution_display += "(output too long)"
# Final safety check
if len(execution_display) > 1990:
execution_display = execution_display[:1980] + "\n...(truncated)"
```
## Implementation Details
### Two Display Scenarios:
#### 1. **Normal Display** (code < 3000 chars)
```python
execution_display = "🐍 Python Code Execution\n\n"
+ packages (if any)
+ input_data (max 500 chars)
+ code (full, up to 3000 chars)
+ output (remaining space, min 100 chars)
+ final_check (ensure < 2000 total)
```
#### 2. **File Attachment Display** (code >= 3000 chars)
```python
execution_display = "🐍 Python Code Execution\n\n"
+ packages (if any)
+ input_data (max 500 chars)
+ "Code: *Attached as file*"
+ output (remaining space, min 100 chars)
+ final_check (ensure < 2000 total)
# Code sent as separate .py file attachment
```
### Smart Truncation Strategy:
1. **Priority Order** (most to least important):
- Header & metadata (packages, input info)
- Code (inline or file attachment)
- Output/Errors (dynamically sized)
2. **Space Allocation**:
- Reserve 1900 chars (100 char buffer)
- Calculate: `remaining = 1900 - len(current_content)`
- Only add output/errors if `remaining > 100`
3. **Safety Net**:
- Final check: `if len(message) > 1990`
- Hard truncate at 1980 with "...(truncated)"
## Benefits
**No More Discord Errors**: Messages never exceed 2000 char limit
**Smart Truncation**: Prioritizes most important information
**Better UX**: Users see as much as possible within limits
**Graceful Degradation**: Long content becomes file attachments
**Clear Indicators**: Shows when content is truncated
## Testing
To test the fix:
1. **Short code + long output**: Should display inline with truncated output
2. **Long code + short output**: Code as file, output inline
3. **Long code + long output**: Code as file, output truncated
4. **Very long error messages**: Should truncate gracefully
Example test case:
```python
# Generate long output
for i in range(1000):
print(f"Line {i}: " + "x" * 100)
```
Before: ❌ Discord 400 error
After: ✅ Displays with "(output truncated)" indicator
## Related Files
- `src/module/message_handler.py` (Lines 400-480)
- Fixed both normal display and file attachment display
- Added dynamic length calculation
- Added final safety check
## Prevention
To prevent similar issues in the future:
1. **Always calculate remaining space** before adding variable-length content
2. **Use final safety check** before sending to Discord
3. **Test with extreme cases** (very long code, output, errors)
4. **Consider file attachments** for content that might exceed limits
## Discord Limits Reference
- **Message content**: 2000 characters max
- **Embed description**: 4096 characters max
- **Embed field value**: 1024 characters max
- **Code blocks**: Count toward message limit
**Note**: We use 1990 as safe limit (10 char buffer) to account for markdown formatting and edge cases.

132
docs/FILE_ACCESS_FIX.md Normal file
View File

@@ -0,0 +1,132 @@
# File Access Fix - Database Type Mismatch
## Problem
Users were uploading files successfully, but when the AI tried to execute code using `load_file()`, it would get the error:
```
ValueError: File 'xxx' not found or not accessible.
No files are currently accessible. Make sure to upload a file first.
```
## Root Cause
**Data Type Mismatch in Database Query**
The issue was in `src/database/db_handler.py` in the `get_user_files()` method:
### What Was Happening:
1. **File Upload** (`code_interpreter.py`):
```python
expires_at = (datetime.now() + timedelta(hours=48)).isoformat()
# Result: "2025-10-04T22:26:25.044108" (ISO string)
```
2. **Database Query** (`db_handler.py`):
```python
current_time = datetime.now() # datetime object
files = await self.db.user_files.find({
"user_id": user_id,
"$or": [
{"expires_at": {"$gt": current_time}}, # Comparing string > datetime ❌
{"expires_at": None}
]
}).to_list(length=1000)
```
3. **Result**: MongoDB couldn't compare ISO string with datetime object, so the query returned 0 files.
### Logs Showing the Issue:
```
2025-10-02 22:26:25,106 - [DEBUG] Saved file metadata to database: 878573881449906208_1759418785_112e8587
2025-10-02 22:26:34,964 - [DEBUG] Fetched 0 files from DB for user 878573881449906208 ❌
2025-10-02 22:26:34,964 - [DEBUG] No files found in database for user 878573881449906208 ❌
```
## Solution
**Changed database query to use ISO string format for time comparison:**
```python
# Before:
current_time = datetime.now() # datetime object
# After:
current_time = datetime.now().isoformat() # ISO string
```
This ensures both values are ISO strings, making the MongoDB comparison work correctly.
## Files Modified
1. **`src/database/db_handler.py`** (Line 344)
- Changed `current_time = datetime.now()` to `current_time = datetime.now().isoformat()`
- Added debug logging to show query results
2. **`src/module/message_handler.py`** (Lines 327-339)
- Added comprehensive debug logging to trace file fetching
3. **`src/utils/code_interpreter.py`** (Lines 153-160)
- Changed `insert_one` to `update_one` with `upsert=True` to avoid duplicate key errors
- Added debug logging for database saves
4. **`src/module/message_handler.py`** (Lines 637-680, 716-720)
- Updated data analysis feature to use `load_file()` with file IDs
- Added `user_files` parameter to `execute_code()` call
## Testing
After the fix, the flow should work correctly:
1. **Upload File**:
```
✅ Saved file metadata to database: 878573881449906208_1759418785_112e8587
```
2. **Fetch Files**:
```
✅ [DEBUG] Query returned 1 files for user 878573881449906208
✅ Code execution will have access to 1 file(s) for user 878573881449906208
```
3. **Execute Code**:
```
✅ Processing 1 file(s) for code execution
✅ Added file to execution context: 878573881449906208_1759418785_112e8587 -> /path/to/file
✅ Total files accessible in execution: 1
```
4. **Load File in Code**:
```python
df = pd.read_excel(load_file('878573881449906208_1759418785_112e8587'))
# ✅ Works!
```
## Restart Required
**Yes, restart the bot** to apply the changes:
```bash
# Stop the bot (Ctrl+C)
# Then restart:
python3 bot.py
```
## Prevention
To prevent similar issues in the future:
1. **Consistent date handling**: Always use the same format (ISO strings or datetime objects) throughout the codebase
2. **Add debug logging**: Log database queries and results to catch data type mismatches
3. **Test file access**: After any database schema changes, test the full file upload → execution flow
## Related Issues
- File upload was working ✅
- Database saving was working ✅
- Database query was failing due to type mismatch ❌
- Code execution couldn't find files ❌
All issues now resolved! ✅

View File

@@ -125,13 +125,33 @@ Tools:
✅ Approved: pandas, numpy, matplotlib, seaborn, scikit-learn, tensorflow, pytorch, plotly, opencv, scipy, statsmodels, pillow, openpyxl, geopandas, folium, xgboost, lightgbm, bokeh, altair, and 80+ more.
📂 File Access: User files are AUTOMATICALLY available via load_file('file_id'). The system tells you when files are uploaded with their file_id. Just use load_file() - it auto-detects file type (CSV→DataFrame, Excel→DataFrame, JSON→dict, etc.)
📂 File Access: When users upload files, you'll receive the file_id in the conversation context (e.g., "File ID: abc123_xyz"). Use load_file('file_id') to access them. The function auto-detects file types:
- CSV/TSV → pandas DataFrame
- Excel (.xlsx, .xls) → pandas ExcelFile object (use .sheet_names and .parse('Sheet1'))
- JSON → dict or DataFrame
- Images → PIL Image object
- Text → string content
- And 200+ more formats...
📊 Excel Files: load_file() returns ExcelFile object for multi-sheet support:
excel_file = load_file('file_id')
sheets = excel_file.sheet_names # Get all sheet names
df = excel_file.parse('Sheet1') # Read specific sheet
# Or: df = pd.read_excel(excel_file, sheet_name='Sheet1')
# Check if sheet has data: if not df.empty and len(df.columns) > 0
⚠️ IMPORTANT:
- If load_file() fails, error lists available file IDs - use the correct one
- Always check if DataFrames are empty before operations like .describe()
- Excel files may have empty sheets - skip or handle them gracefully
💾 Output Files: ALL generated files (CSV, images, JSON, text, plots, etc.) are AUTO-CAPTURED and sent to user. Files stored for 48h (configurable). Just create files - they're automatically shared!
✅ DO:
- Import packages directly (auto-installs!)
- Use load_file('file_id') for user uploads
- Use load_file('file_id') with the EXACT file_id from context
- Check if DataFrames are empty: if not df.empty and len(df.columns) > 0
- Handle errors gracefully (empty sheets, missing data, etc.)
- Create output files with descriptive names
- Generate visualizations (plt.savefig, etc.)
- Return multiple files (data + plots + reports)
@@ -141,6 +161,7 @@ Tools:
- Use install_packages parameter
- Print large datasets (create CSV instead)
- Manually handle file paths
- Guess file_ids - use the exact ID from the upload message
Example:
```python
@@ -148,16 +169,26 @@ import pandas as pd
import seaborn as sns # Auto-installs!
import matplotlib.pyplot as plt
# Load user's file (file_id provided in context)
df = load_file('abc123') # Auto-detects CSV/Excel/JSON/etc
# Load user's file (file_id from upload message: "File ID: 123456_abc")
data = load_file('123456_abc') # Auto-detects type
# Process and analyze
summary = df.describe()
summary.to_csv('summary_stats.csv')
# For Excel files:
if hasattr(data, 'sheet_names'): # It's an ExcelFile
for sheet in data.sheet_names:
df = data.parse(sheet)
if not df.empty and len(df.columns) > 0:
# Process non-empty sheets
summary = df.describe()
summary.to_csv(f'{sheet}_summary.csv')
else: # It's already a DataFrame (CSV, etc.)
df = data
summary = df.describe()
summary.to_csv('summary_stats.csv')
# Create visualization
sns.heatmap(df.corr(), annot=True)
plt.savefig('correlation_plot.png')
if not df.empty:
sns.heatmap(df.corr(), annot=True)
plt.savefig('correlation_plot.png')
# Everything is automatically sent to user!
```

View File

@@ -341,7 +341,7 @@ class DatabaseHandler:
async def get_user_files(self, user_id: int) -> List[Dict[str, Any]]:
"""Get all files for a specific user"""
try:
current_time = datetime.now()
current_time = datetime.now().isoformat() # Use ISO string for comparison
files = await self.db.user_files.find({
"user_id": user_id,
"$or": [
@@ -349,6 +349,7 @@ class DatabaseHandler:
{"expires_at": None} # Never expires
]
}).to_list(length=1000)
logging.info(f"[DEBUG] Query returned {len(files)} files for user {user_id}")
return files
except Exception as e:
logging.error(f"Error getting user files: {e}")

View File

@@ -328,9 +328,15 @@ class MessageHandler:
if user_id:
try:
db_files = await self.db.get_user_files(user_id)
logging.info(f"[DEBUG] Fetched {len(db_files) if db_files else 0} files from DB for user {user_id}")
if db_files:
for f in db_files:
logging.info(f"[DEBUG] DB file: {f.get('file_id', 'NO_ID')} - {f.get('filename', 'NO_NAME')}")
user_files = [f['file_id'] for f in db_files if 'file_id' in f]
if user_files:
logging.info(f"Code execution will have access to {len(user_files)} file(s) for user {user_id}")
logging.info(f"Code execution will have access to {len(user_files)} file(s) for user {user_id}: {user_files}")
else:
logging.warning(f"[DEBUG] No files found in database for user {user_id}")
except Exception as e:
logging.warning(f"Could not fetch user files: {e}")
@@ -405,17 +411,31 @@ class MessageHandler:
if output and output.strip():
execution_display += "**📤 Output:**\n```\n"
execution_display += output[:2000] # More space for output when code is attached
if len(output) > 2000:
execution_display += "\n... (output truncated)"
# Calculate remaining space (2000 - current length - markdown)
remaining = 1900 - len(execution_display)
if remaining > 100:
execution_display += output[:remaining]
if len(output) > remaining:
execution_display += "\n... (output truncated)"
else:
execution_display += "(output too long)"
execution_display += "\n```"
else:
execution_display += "**📤 Output:** *(No output)*"
else:
error_msg = execute_result.get("error", "Unknown error") if execute_result else "Execution failed"
execution_display += f"**❌ Error:**\n```\n{error_msg[:1000]}\n```"
if len(error_msg) > 1000:
execution_display += "*(Error message truncated)*"
# Calculate remaining space
remaining = 1900 - len(execution_display)
if remaining > 100:
execution_display += f"**❌ Error:**\n```\n{error_msg[:remaining]}\n```"
if len(error_msg) > remaining:
execution_display += "*(Error message truncated)*"
else:
execution_display += "**❌ Error:** *(Error too long - see logs)*"
# Final safety check: ensure total length < 2000
if len(execution_display) > 1990:
execution_display = execution_display[:1980] + "\n...(truncated)"
# Send with file attachment
await discord_message.channel.send(execution_display, file=code_file)
@@ -450,17 +470,31 @@ class MessageHandler:
if output and output.strip():
execution_display += "**📤 Output:**\n```\n"
execution_display += output[:1000] # Limit output length for Discord
if len(output) > 1000:
execution_display += "\n... (output truncated)"
# Calculate remaining space (2000 - current length - markdown)
remaining = 1900 - len(execution_display)
if remaining > 100:
execution_display += output[:remaining]
if len(output) > remaining:
execution_display += "\n... (output truncated)"
else:
execution_display += "(output too long)"
execution_display += "\n```"
else:
execution_display += "**📤 Output:** *(No output)*"
else:
error_msg = execute_result.get("error", "Unknown error") if execute_result else "Execution failed"
execution_display += f"**❌ Error:**\n```\n{error_msg[:800]}\n```"
if len(error_msg) > 800:
execution_display += "*(Error message truncated)*"
# Calculate remaining space
remaining = 1900 - len(execution_display)
if remaining > 100:
execution_display += f"**❌ Error:**\n```\n{error_msg[:remaining]}\n```"
if len(error_msg) > remaining:
execution_display += "*(Error message truncated)*"
else:
execution_display += "**❌ Error:** *(Error too long - see logs)*"
# Final safety check: ensure total length < 2000
if len(execution_display) > 1990:
execution_display = execution_display[:1980] + "\n...(truncated)"
# Send the execution display to Discord as a separate message
await discord_message.channel.send(execution_display)
@@ -636,24 +670,41 @@ class MessageHandler:
)
if upload_result['success']:
# Use the new file path
# Get file_id for new load_file() system
file_id = upload_result['file_id']
file_path = upload_result['file_path']
logging.info(f"Migrated file to code interpreter: {file_path}")
logging.info(f"Migrated file to code interpreter: {file_path} (ID: {file_id})")
except Exception as e:
logging.warning(f"Could not migrate file to code interpreter: {e}")
file_id = None
else:
# File is already in new system, get file_id from args
file_id = args.get("file_id")
# Generate analysis code based on the request
# Detect file type
file_ext = os.path.splitext(file_path)[1].lower()
if file_ext in ['.xlsx', '.xls']:
load_statement = f"df = pd.read_excel('{file_path}')"
elif file_ext == '.json':
load_statement = f"df = pd.read_json('{file_path}')"
elif file_ext == '.parquet':
load_statement = f"df = pd.read_parquet('{file_path}')"
else: # Default to CSV
load_statement = f"df = pd.read_csv('{file_path}')"
# Use load_file() if we have a file_id, otherwise use direct path
if file_id:
if file_ext in ['.xlsx', '.xls']:
load_statement = f"df = pd.read_excel(load_file('{file_id}'))"
elif file_ext == '.json':
load_statement = f"df = pd.read_json(load_file('{file_id}'))"
elif file_ext == '.parquet':
load_statement = f"df = pd.read_parquet(load_file('{file_id}'))"
else: # Default to CSV
load_statement = f"df = pd.read_csv(load_file('{file_id}'))"
else:
# Fallback to direct path for legacy support
if file_ext in ['.xlsx', '.xls']:
load_statement = f"df = pd.read_excel('{file_path}')"
elif file_ext == '.json':
load_statement = f"df = pd.read_json('{file_path}')"
elif file_ext == '.parquet':
load_statement = f"df = pd.read_parquet('{file_path}')"
else: # Default to CSV
load_statement = f"df = pd.read_csv('{file_path}')"
analysis_code = f"""
import pandas as pd
@@ -695,9 +746,13 @@ print("\\n=== Correlation Analysis ===")
"""
# Execute the analysis code
# Pass file_id as user_files if available
user_files_for_analysis = [file_id] if file_id else []
result = await execute_code(
code=analysis_code,
user_id=user_id,
user_files=user_files_for_analysis,
db_handler=self.db
)

View File

@@ -71,9 +71,10 @@ APPROVED_PACKAGES = {
}
# Blocked patterns
# Note: We allow open() for writing to enable saving plots and outputs
# The sandboxed environment restricts file access to safe directories
BLOCKED_PATTERNS = [
r'\bopen\s*\([^)]*[\'"]w',
r'\bopen\s*\([^)]*[\'"]a',
# Dangerous system modules
r'import\s+os\b(?!\s*\.path)',
r'from\s+os\s+import\s+(?!path)',
r'import\s+shutil\b',
@@ -90,12 +91,14 @@ BLOCKED_PATTERNS = [
r'from\s+requests\s+import',
r'import\s+aiohttp\b',
r'from\s+aiohttp\s+import',
# Dangerous code execution
r'__import__\s*\(',
r'\beval\s*\(',
r'\bexec\s*\(',
r'\bcompile\s*\(',
r'\bglobals\s*\(',
r'\blocals\s*\(',
# File system operations (dangerous)
r'\.unlink\s*\(',
r'\.rmdir\s*\(',
r'\.remove\s*\(',
@@ -151,7 +154,13 @@ class FileManager:
}
if self.db:
await self.db.db.user_files.insert_one(metadata)
# Use update_one with upsert to avoid duplicate key errors
await self.db.db.user_files.update_one(
{"file_id": file_id},
{"$set": metadata},
upsert=True
)
logger.info(f"[DEBUG] Saved file metadata to database: {file_id}")
expiration_msg = "never expires" if FILE_EXPIRATION_HOURS == -1 else f"expires in {FILE_EXPIRATION_HOURS}h"
logger.info(f"Saved file {filename} for user {user_id}: {file_id} ({expiration_msg})")
@@ -732,10 +741,21 @@ class CodeExecutor:
try:
file_paths_map = {}
if user_files:
logger.info(f"Processing {len(user_files)} file(s) for code execution")
for file_id in user_files:
file_meta = await self.file_manager.get_file(file_id, user_id)
if file_meta:
file_paths_map[file_id] = file_meta['file_path']
logger.info(f"Added file to execution context: {file_id} -> {file_meta['file_path']}")
else:
logger.warning(f"File {file_id} not found or expired for user {user_id}")
if file_paths_map:
logger.info(f"Total files accessible in execution: {len(file_paths_map)}")
else:
logger.warning(f"No files found for user {user_id} despite {len(user_files)} file_ids provided")
else:
logger.debug("No user files provided for code execution")
env_setup = f"""
import sys
@@ -747,9 +767,36 @@ def load_file(file_id):
'''
Load a file automatically based on its extension.
Supports 200+ file types with smart auto-detection.
Args:
file_id: The file ID provided when the file was uploaded
Returns:
Loaded file data (varies by file type):
- CSV/TSV: pandas DataFrame
- Excel (.xlsx, .xls): pandas ExcelFile object
- JSON: pandas DataFrame or dict
- Parquet/Feather: pandas DataFrame
- Text files: string content
- Images: PIL Image object
- And 200+ more formats...
Excel file usage examples:
excel_file = load_file('file_id')
sheet_names = excel_file.sheet_names
df = excel_file.parse('Sheet1')
df2 = pd.read_excel(excel_file, sheet_name='Sheet1')
Available files: {{', '.join(FILES.keys()) if FILES else 'None'}}
'''
if file_id not in FILES:
raise ValueError(f"File {{file_id}} not found or not accessible")
available_files = list(FILES.keys())
error_msg = f"File '{{file_id}}' not found or not accessible.\\n"
if available_files:
error_msg += f"Available file IDs: {{', '.join(available_files)}}"
else:
error_msg += "No files are currently accessible. Make sure to upload a file first."
raise ValueError(error_msg)
file_path = FILES[file_id]
# Import common libraries (they'll auto-install if needed)
@@ -763,9 +810,12 @@ def load_file(file_id):
if ext == 'csv':
return pd.read_csv(file_path)
elif ext in ['xlsx', 'xls', 'xlsm', 'xlsb']:
return pd.read_excel(file_path)
# Return ExcelFile object for multi-sheet access
# Users can: excel_file.sheet_names, excel_file.parse('Sheet1'), or pd.read_excel(excel_file, sheet_name='Sheet1')
return pd.ExcelFile(file_path)
elif ext == 'ods':
return pd.read_excel(file_path, engine='odf')
# Return ExcelFile object for ODS multi-sheet access
return pd.ExcelFile(file_path, engine='odf')
elif ext == 'tsv' or ext == 'tab':
return pd.read_csv(file_path, sep='\\t')