Skip to content

Commit a5caf46

Browse files
committed
feat: Add automatic compaction of historical messages for agents
Implements #338 - Agent self-managed message compaction: 1. Enhanced LLM abstraction to track token limits for all providers 2. Added status update mechanism to inform agents about resource usage 3. Created compactHistory tool for summarizing older messages 4. Updated agent documentation and system prompt 5. Added tests for the new functionality 6. Created documentation for the message compaction feature This feature helps prevent context window overflow errors by giving agents awareness of their token usage and tools to manage their context window.
1 parent 6c0deaa commit a5caf46

File tree

15 files changed

+708
-6
lines changed

15 files changed

+708
-6
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Command-line interface for AI-powered coding tasks. Full details available on th
1212
- 👤 **Human Compatible**: Uses README.md, project files and shell commands to build its own context
1313
- 🌐 **GitHub Integration**: GitHub mode for working with issues and PRs as part of workflow
1414
- 📄 **Model Context Protocol**: Support for MCP to access external context sources
15+
- 🧠 **Message Compaction**: Automatic management of context window for long-running agents
1516

1617
Please join the MyCoder.ai discord for support: https://discord.gg/5K6TYrHGHt
1718

docs/features/message-compaction.md

+101
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Message Compaction
2+
3+
When agents run for extended periods, they accumulate a large history of messages that eventually fills up the LLM's context window, causing errors when the token limit is exceeded. The message compaction feature helps prevent this by providing agents with awareness of their token usage and tools to manage their context window.
4+
5+
## Features
6+
7+
### 1. Token Usage Tracking
8+
9+
The LLM abstraction now tracks and returns:
10+
- Total tokens used in the current completion request
11+
- Maximum allowed tokens for the model/provider
12+
13+
This information is used to monitor context window usage and trigger appropriate actions.
14+
15+
### 2. Status Updates
16+
17+
Agents receive periodic status updates (every 5 interactions) with information about:
18+
- Current token usage and percentage of the maximum
19+
- Cost so far
20+
- Active sub-agents and their status
21+
- Active shell processes and their status
22+
- Active browser sessions and their status
23+
24+
Example status update:
25+
```
26+
--- STATUS UPDATE ---
27+
Token Usage: 45,235/100,000 (45%)
28+
Cost So Far: $0.23
29+
30+
Active Sub-Agents: 2
31+
- sa_12345: Analyzing project structure and dependencies
32+
- sa_67890: Implementing unit tests for compactHistory tool
33+
34+
Active Shell Processes: 3
35+
- sh_abcde: npm test
36+
- sh_fghij: npm run watch
37+
- sh_klmno: git status
38+
39+
Active Browser Sessions: 1
40+
- bs_12345: https://www.typescriptlang.org/docs/handbook/utility-types.html
41+
42+
If token usage is high (>70%), consider using the 'compactHistory' tool to reduce context size.
43+
--- END STATUS ---
44+
```
45+
46+
### 3. Message Compaction Tool
47+
48+
The `compactHistory` tool allows agents to compact their message history by summarizing older messages while preserving recent context. This tool:
49+
50+
1. Takes a parameter for how many recent messages to preserve unchanged
51+
2. Summarizes all older messages into a single, concise summary
52+
3. Replaces the original messages with the summary and preserved messages
53+
4. Reports on the reduction in context size
54+
55+
## Usage
56+
57+
Agents are instructed to monitor their token usage through status updates and use the `compactHistory` tool when token usage approaches 70% of the maximum:
58+
59+
```javascript
60+
// Example of agent using the compactHistory tool
61+
{
62+
name: "compactHistory",
63+
preserveRecentMessages: 10,
64+
customPrompt: "Focus on summarizing our key decisions and current tasks."
65+
}
66+
```
67+
68+
## Configuration
69+
70+
The message compaction feature is enabled by default with reasonable defaults:
71+
- Status updates every 5 agent interactions
72+
- Recommendation to compact at 70% token usage
73+
- Default preservation of 10 recent messages when compacting
74+
75+
## Model Token Limits
76+
77+
The system includes token limits for various models:
78+
79+
### Anthropic Models
80+
- claude-3-opus-20240229: 200,000 tokens
81+
- claude-3-sonnet-20240229: 200,000 tokens
82+
- claude-3-haiku-20240307: 200,000 tokens
83+
- claude-2.1: 100,000 tokens
84+
85+
### OpenAI Models
86+
- gpt-4o: 128,000 tokens
87+
- gpt-4-turbo: 128,000 tokens
88+
- gpt-3.5-turbo: 16,385 tokens
89+
90+
### Ollama Models
91+
- llama2: 4,096 tokens
92+
- mistral: 8,192 tokens
93+
- mixtral: 32,768 tokens
94+
95+
## Benefits
96+
97+
- Prevents context window overflow errors
98+
- Maintains important context for agent operation
99+
- Enables longer-running agent sessions
100+
- Makes the system more robust for complex tasks
101+
- Gives agents self-awareness of resource usage

example-status-update.md

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Example Status Update
2+
3+
This is an example of what the status update looks like for the agent:
4+
5+
```
6+
--- STATUS UPDATE ---
7+
Token Usage: 45,235/100,000 (45%)
8+
Cost So Far: $0.23
9+
10+
Active Sub-Agents: 2
11+
- sa_12345: Analyzing project structure and dependencies
12+
- sa_67890: Implementing unit tests for compactHistory tool
13+
14+
Active Shell Processes: 3
15+
- sh_abcde: npm test -- --watch packages/agent/src/tools/utility
16+
- sh_fghij: npm run watch
17+
- sh_klmno: git status
18+
19+
Active Browser Sessions: 1
20+
- bs_12345: https://www.typescriptlang.org/docs/handbook/utility-types.html
21+
22+
If token usage is high (>70%), consider using the 'compactHistory' tool to reduce context size.
23+
--- END STATUS ---
24+
```
25+
26+
## About Status Updates
27+
28+
Status updates are sent periodically to the agent (every 5 interactions) to provide awareness of:
29+
30+
1. **Token Usage**: Current usage and percentage of maximum context window
31+
2. **Cost**: Estimated cost of the session so far
32+
3. **Active Sub-Agents**: Running background agents and their tasks
33+
4. **Active Shell Processes**: Running shell commands
34+
5. **Active Browser Sessions**: Open browser sessions and their URLs
35+
36+
When token usage gets high (>70%), the agent is reminded to use the `compactHistory` tool to reduce context size by summarizing older messages.
37+
38+
## Using the compactHistory Tool
39+
40+
The agent can use the compactHistory tool like this:
41+
42+
```javascript
43+
{
44+
name: "compactHistory",
45+
preserveRecentMessages: 10,
46+
customPrompt: "Optional custom summarization prompt"
47+
}
48+
```
49+
50+
This will summarize all but the 10 most recent messages into a single summary message, significantly reducing token usage while preserving important context.

packages/agent/src/core/llm/providers/anthropic.ts

+27-3
Original file line numberDiff line numberDiff line change
@@ -81,13 +81,33 @@ function addCacheControlToMessages(
8181
});
8282
}
8383

84-
function tokenUsageFromMessage(message: Anthropic.Message) {
84+
// Define model context window sizes for Anthropic models
85+
const ANTHROPIC_MODEL_LIMITS: Record<string, number> = {
86+
'claude-3-opus-20240229': 200000,
87+
'claude-3-sonnet-20240229': 200000,
88+
'claude-3-haiku-20240307': 200000,
89+
'claude-3-7-sonnet-20250219': 200000,
90+
'claude-2.1': 100000,
91+
'claude-2.0': 100000,
92+
'claude-instant-1.2': 100000,
93+
// Add other models as needed
94+
};
95+
96+
function tokenUsageFromMessage(message: Anthropic.Message, model: string) {
8597
const usage = new TokenUsage();
8698
usage.input = message.usage.input_tokens;
8799
usage.cacheWrites = message.usage.cache_creation_input_tokens ?? 0;
88100
usage.cacheReads = message.usage.cache_read_input_tokens ?? 0;
89101
usage.output = message.usage.output_tokens;
90-
return usage;
102+
103+
const totalTokens = usage.input + usage.output;
104+
const maxTokens = ANTHROPIC_MODEL_LIMITS[model] || 100000; // Default fallback
105+
106+
return {
107+
usage,
108+
totalTokens,
109+
maxTokens,
110+
};
91111
}
92112

93113
/**
@@ -175,10 +195,14 @@ export class AnthropicProvider implements LLMProvider {
175195
};
176196
});
177197

198+
const tokenInfo = tokenUsageFromMessage(response, this.model);
199+
178200
return {
179201
text: content,
180202
toolCalls: toolCalls,
181-
tokenUsage: tokenUsageFromMessage(response),
203+
tokenUsage: tokenInfo.usage,
204+
totalTokens: tokenInfo.totalTokens,
205+
maxTokens: tokenInfo.maxTokens,
182206
};
183207
} catch (error) {
184208
throw new Error(

packages/agent/src/core/llm/providers/ollama.ts

+27
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,22 @@ import {
1313

1414
import { TokenUsage } from '../../tokens.js';
1515
import { ToolCall } from '../../types.js';
16+
// Define model context window sizes for Ollama models
17+
// These are approximate and may vary based on specific model configurations
18+
const OLLAMA_MODEL_LIMITS: Record<string, number> = {
19+
'llama2': 4096,
20+
'llama2-uncensored': 4096,
21+
'llama2:13b': 4096,
22+
'llama2:70b': 4096,
23+
'mistral': 8192,
24+
'mistral:7b': 8192,
25+
'mixtral': 32768,
26+
'codellama': 16384,
27+
'phi': 2048,
28+
'phi2': 2048,
29+
'openchat': 8192,
30+
// Add other models as needed
31+
};
1632
import { LLMProvider } from '../provider.js';
1733
import {
1834
GenerateOptions,
@@ -114,11 +130,22 @@ export class OllamaProvider implements LLMProvider {
114130
const tokenUsage = new TokenUsage();
115131
tokenUsage.output = response.eval_count || 0;
116132
tokenUsage.input = response.prompt_eval_count || 0;
133+
134+
// Calculate total tokens and get max tokens for the model
135+
const totalTokens = tokenUsage.input + tokenUsage.output;
136+
137+
// Extract the base model name without specific parameters
138+
const baseModelName = this.model.split(':')[0];
139+
const maxTokens = OLLAMA_MODEL_LIMITS[this.model] ||
140+
OLLAMA_MODEL_LIMITS[baseModelName] ||
141+
4096; // Default fallback
117142

118143
return {
119144
text: content,
120145
toolCalls: toolCalls,
121146
tokenUsage: tokenUsage,
147+
totalTokens,
148+
maxTokens,
122149
};
123150
}
124151

packages/agent/src/core/llm/providers/openai.ts

+19
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,19 @@ import OpenAI from 'openai';
55

66
import { TokenUsage } from '../../tokens.js';
77
import { ToolCall } from '../../types';
8+
9+
// Define model context window sizes for OpenAI models
10+
const OPENAI_MODEL_LIMITS: Record<string, number> = {
11+
'gpt-4o': 128000,
12+
'gpt-4-turbo': 128000,
13+
'gpt-4-0125-preview': 128000,
14+
'gpt-4-1106-preview': 128000,
15+
'gpt-4': 8192,
16+
'gpt-4-32k': 32768,
17+
'gpt-3.5-turbo': 16385,
18+
'gpt-3.5-turbo-16k': 16385,
19+
// Add other models as needed
20+
};
821
import { LLMProvider } from '../provider.js';
922
import {
1023
GenerateOptions,
@@ -116,11 +129,17 @@ export class OpenAIProvider implements LLMProvider {
116129
const tokenUsage = new TokenUsage();
117130
tokenUsage.input = response.usage?.prompt_tokens || 0;
118131
tokenUsage.output = response.usage?.completion_tokens || 0;
132+
133+
// Calculate total tokens and get max tokens for the model
134+
const totalTokens = tokenUsage.input + tokenUsage.output;
135+
const maxTokens = OPENAI_MODEL_LIMITS[this.model] || 8192; // Default fallback
119136

120137
return {
121138
text: content,
122139
toolCalls,
123140
tokenUsage,
141+
totalTokens,
142+
maxTokens,
124143
};
125144
} catch (error) {
126145
throw new Error(`Error calling OpenAI API: ${(error as Error).message}`);

packages/agent/src/core/llm/types.ts

+3
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,9 @@ export interface LLMResponse {
8080
text: string;
8181
toolCalls: ToolCall[];
8282
tokenUsage: TokenUsage;
83+
// Add new fields for context window tracking
84+
totalTokens?: number; // Total tokens used in this request
85+
maxTokens?: number; // Maximum allowed tokens for this model
8386
}
8487

8588
/**

0 commit comments

Comments
 (0)