Strategies: Optimization & Management

Khái niệm cơ bản

Quản lý context giống như dọn dẹp ổ cứng…

Bạn có 1TB ổ cứng (Context Window) nhưng muốn lưu phim 4K (RAG docs) + game (Tools) + ảnh (History). Làm sao?

Xóa file rác → Selection
Nén file ZIP → Compression
Chia ra nhiều ổ → Isolation

1. Context Selection (Lọc thông minh)

Vấn đề

Ném quá nhiều thông tin → AI bị “chìm đuối” (Lost in the Middle).

Giải pháp: RAG for Everything

Không chỉ RAG cho documents, mà RAG cho cả:

Tools: 100 tools? Search top 5 liên quan nhất.
History: 1000 tin nhắn? Semantic search top 10.
Examples: 500 few-shot examples? Lấy 3 giống nhất.


# RAG for tool selection
relevant_tools = vector_db.search(
    query=user_query,
    collection="tool_definitions",
    top_k=5
)
context["tools"] = relevant_tools

2. Context Compression (Nén không mất ý)

Summarization (Tóm tắt)

Before (500 tokens):


User: Tôi muốn đặt bàn cho 4 người
AI: Dạ anh/chị muốn đặt bàn ngày nào ạ?
User: Thứ 7 tuần này
AI: Dạ nhà hàng còn bàn lúc 11h, 12h, 18h, 19h, 20h. Anh/chị chọn giờ nào ạ?
User: 19h
AI: Dạ vâng, em xác nhận đặt bàn 4 người, thứ 7, 19h. Anh/chị cho em xin...
(tiếp tục 50 tin nhắn)

After (50 tokens):


<conversation_summary>
User đặt bàn: 4 người, thứ 7, 19h. Đã xác nhận tên: Nguyễn Văn A, SĐT: 0901234567.
Vấn đề đang xử lý: User hỏi về menu đồ chay.
</conversation_summary>

Auto-Compact (như Claude Code)

Khi context đạt 95% capacity:


[COMPACTING CONTEXT...]
- Summarized: 2000 tokens → 200 tokens
- Preserved: Key decisions, code snippets, current task
- Removed: Casual chat, repeated explanations

3. Context Isolation (Chia để trị)

Khi nào cần Isolation?

Khi một Agent phải làm quá nhiều việc khác nhau → Context bị loãng.

Giải pháp: Multi-Agent Architecture


┌─────────────────┐
│  Router Agent   │ (Context: User intent + Agent list)
└────────┬────────┘
         │
    ┌────┴────┬─────────┐
    ▼         ▼         ▼
┌───────┐ ┌───────┐ ┌───────┐
│ Coder │ │ Writer│ │ Search│
│ Agent │ │ Agent │ │ Agent │
└───────┘ └───────┘ └───────┘
(Code docs) (Style guide) (RAG docs)

Mỗi agent có context riêng biệt, tập trung vào nhiệm vụ.

Bảng so sánh chiến lược

Chiến lược	Khi nào dùng	Ưu điểm	Nhược điểm
Selection	Nhiều nguồn data	Nhanh, chính xác	Cần vector DB
Compression	Hội thoại dài	Tiết kiệm tokens	Có thể mất details
Isolation	Task phức tạp	Chuyên môn hóa	Phức tạp hệ thống

Bài tập thực hành 🧪

Mục tiêu

Implement summarization cho conversation history.

Code


from openai import OpenAI
 
client = OpenAI()
 
def summarize_history(messages: list[dict], max_tokens: int = 200) -> str:
    """Tóm tắt lịch sử hội thoại."""
    
    history_text = "\n".join([
        f"{m['role']}: {m['content']}" 
        for m in messages
    ])
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # Model rẻ cho summarization
        messages=[{
            "role": "system",
            "content": "Tóm tắt cuộc hội thoại sau thành 2-3 câu. Giữ lại thông tin quan trọng."
        }, {
            "role": "user", 
            "content": history_text
        }],
        max_tokens=max_tokens
    )
    
    return response.choices[0].message.content
 
# Usage
old_messages = [...100 messages...]
summary = summarize_history(old_messages)
new_context = f"<summary>{summary}</summary>" + recent_messages[-5:]

Thử thách

Thêm logic: Chỉ summarize khi len(messages) > 20.