Files
ChatGPT-Discord-Bot/docs/QUICK_REFERENCE_GENERATED_FILES.md
cauvang32 9c180bdd89 Refactor OpenAI utilities and remove Python executor
- Removed the `analyze_data_file` function from tool definitions to streamline functionality.
- Enhanced the `execute_python_code` function description to clarify auto-installation of packages and file handling.
- Deleted the `python_executor.py` module to simplify the codebase and improve maintainability.
- Introduced a new `token_counter.py` module for efficient token counting for OpenAI API requests, including support for Discord image links and cost estimation.
2025-10-02 21:49:48 +07:00

5.9 KiB

Generated Files - Quick Reference

🎯 What Changed?

ALL file types are now captured (not just images)
48-hour expiration for generated files
file_id for accessing files later
80+ file extensions supported


📊 Execution Result Structure

result = {
    "success": True,
    "output": "Analysis complete!",
    "error": "",
    "execution_time": 2.5,
    "return_code": 0,
    "generated_files": [          # Immediate data for Discord
        {
            "filename": "report.txt",
            "data": b"...",         # Binary content
            "type": "text",          # File category
            "size": 1234,           # Bytes
            "file_id": "123_..."    # For later access ← NEW!
        }
    ],
    "generated_file_ids": [       # Quick reference ← NEW!
        "123_1696118400_abc123",
        "123_1696118401_def456"
    ]
}

🔧 Key Functions

Execute Code

result = await execute_code(
    code="df.to_csv('data.csv')",
    user_id=123,
    db_handler=db
)
# Generated files automatically saved with 48h expiration

Load Generated File (Within 48h)

file_data = await load_file(
    file_id="123_1696118400_abc123",
    user_id=123,
    db_handler=db
)
# Returns: {"success": True, "data": b"...", "filename": "data.csv"}

List All Files

files = await list_user_files(user_id=123, db_handler=db)
# Returns all non-expired files (uploaded + generated)

Use File in Code

code = """
# Load previously generated file
df = load_file('123_1696118400_abc123')
print(f'Loaded {len(df)} rows')
"""

result = await execute_code(
    code=code,
    user_id=123,
    user_files=["123_1696118400_abc123"]
)

📁 Supported File Types (80+)

Type Extensions Category
Images .png, .jpg, .gif, .svg "image"
Data .csv, .xlsx, .parquet, .feather "data"
Text .txt, .md, .log "text"
Structured .json, .xml, .yaml "structured"
Code .py, .js, .sql, .r "code"
Archive .zip, .tar, .gz "archive"
Scientific .npy, .pickle, .hdf5 Various
HTML .html, .htm "html"
PDF .pdf "pdf"

Full list: See GENERATED_FILES_GUIDE.md


File Lifecycle

Create → Save → Available 48h → Auto-Delete
  ↓       ↓          ↓              ↓
Code   Database   Use file_id    Cleanup
runs    record    to access       task

Timeline Example:

  • Day 1, 10:00 AM: File created
  • Day 1-3: File accessible via file_id
  • Day 3, 10:01 AM: File expires and is auto-deleted

💡 Common Patterns

Pattern 1: Multi-Format Export

code = """
df.to_csv('data.csv')
df.to_json('data.json')
df.to_excel('data.xlsx')
print('Exported to 3 formats!')
"""

Pattern 2: Reuse Generated File

# Step 1: Generate
result1 = await execute_code(
    code="df.to_csv('results.csv')",
    user_id=123
)
file_id = result1["generated_file_ids"][0]

# Step 2: Reuse (within 48h)
result2 = await execute_code(
    code=f"df = load_file('{file_id}')",
    user_id=123,
    user_files=[file_id]
)

Pattern 3: Multi-Step Analysis

# Day 1: Generate dataset
code1 = "df.to_parquet('dataset.parquet')"
result1 = await execute_code(code1, user_id=123)

# Day 2: Analyze (file still valid)
code2 = """
df = load_file('123_...')  # Use file_id from result1
# Perform analysis
"""
result2 = await execute_code(code2, user_id=123, user_files=['123_...'])

🎨 Discord Integration

# Send files to user
for gen_file in result["generated_files"]:
    file_bytes = io.BytesIO(gen_file["data"])
    discord_file = discord.File(file_bytes, filename=gen_file["filename"])
    
    # Include file_id for user reference
    await message.channel.send(
        f"📎 `{gen_file['filename']}` (ID: `{gen_file['file_id']}`)",
        file=discord_file
    )

User sees:

📎 analysis.csv (ID: 123_1696118400_abc123) [downloadable]
📊 chart.png (ID: 123_1696118401_def456) [downloadable]
📝 report.txt (ID: 123_1696118402_ghi789) [downloadable]

💾 Files available for 48 hours

🧹 Cleanup

Automatic (Every Hour):

# In bot.py
cleanup_task = create_discord_cleanup_task(bot, db_handler)

@bot.event
async def on_ready():
    cleanup_task.start()

Manual:

deleted = await cleanup_expired_files(db_handler)
print(f"Deleted {deleted} expired files")

🔒 Security

User isolation (can't access other users' files)
50MB max file size
48-hour auto-expiration
User-specific directories
No permanent storage


📚 Full Documentation

  • GENERATED_FILES_GUIDE.md - Complete usage guide
  • GENERATED_FILES_UPDATE_SUMMARY.md - Technical changes
  • CODE_INTERPRETER_GUIDE.md - General code interpreter docs
  • NEW_FEATURES_GUIDE.md - All new features

Status

  • All file types captured
  • 48-hour persistence implemented
  • file_id system working
  • Database integration complete
  • Automatic cleanup configured
  • Documentation created
  • Ready for production testing!

🚀 Quick Start

# 1. Execute code that generates files
result = await execute_code(
    code="""
    import pandas as pd
    df = pd.DataFrame({'x': [1,2,3]})
    df.to_csv('data.csv')
    df.to_json('data.json')
    print('Files created!')
    """,
    user_id=123,
    db_handler=db
)

# 2. Files are automatically:
#    - Saved to database (48h expiration)
#    - Sent to Discord
#    - Accessible via file_id

# 3. Use later (within 48h)
code2 = f"df = load_file('{result['generated_file_ids'][0]}')"
result2 = await execute_code(code2, user_id=123, user_files=[...])

That's it! Your code interpreter now handles all file types with 48-hour persistence! 🎉