Memory and Conversation Context#
How Vaani remembers and understands conversations.
Overview#
Vaani’s memory system enables:
Context Awareness - Understanding references to previous messages
Multi-turn Dialogue - Natural back-and-forth conversations
Entity Memory - Remembering people, places, topics mentioned
Conversation State - Tracking the current discussion
This makes conversations feel natural rather than disjointed.
Memory Architecture#
Three-Layer Memory System
Layer 1: Short-term Buffer (Last 5-10 messages)
├─ Most recent exchanges
├─ Used for immediate context
└─ Rapidly fades if not referenced
Layer 2: Active Context (Last 20-50 messages)
├─ Current conversation thread
├─ Entities and topics mentioned
└─ Used for understanding references
Layer 3: Session History (Entire session)
├─ All previous messages
├─ Available if needed
└─ Cleared when Vaani restarts
Memory Storage
Memory is stored in memory:
{
"session_id": "1234567890",
"created": "2024-01-15 14:00:00",
"messages": [
{
"timestamp": "2024-01-15 14:00:05",
"user": "What's the weather in New York?",
"vaani": "It's 72°F and sunny",
"intent": "WEATHER_QUERY",
"entities": [{"type": "LOCATION", "value": "New York"}]
},
{
"timestamp": "2024-01-15 14:00:12",
"user": "How about Tokyo?",
"vaani": "In Tokyo it's 28°C with clouds",
"intent": "WEATHER_QUERY",
"entities": [{"type": "LOCATION", "value": "Tokyo"}]
}
]
}
Context Usage#
Example: Weather Conversation
Turn 1 (Context is empty)
User: "What's the weather in New York?"
Vaani: Searches for "weather New York"
Vaani: "It's 72°F and sunny in New York"
[Context updated: location=New York, query=weather]
Turn 2 (Context includes previous location)
User: "How about Tokyo?"
Vaani: Recognizes "Tokyo" as a location
Vaani: Searches for "weather Tokyo" (not "weather how about Tokyo")
Vaani: "In Tokyo it's 28°C with clouds"
[Context updated: location=Tokyo]
Turn 3 (Context remembers current focus)
User: "Will it rain?"
Vaani: Knows you're asking about Tokyo (from context)
Vaani: Searches for "Tokyo rain forecast"
Vaani: "The forecast shows 20% chance of rain in Tokyo tomorrow"
How Context Helps
Without context:
Q: "How about Tokyo?"
Vaani: "How about Tokyo what? Can you clarify?"
Q: "Will it rain?"
Vaani: "Will what rain? Where?"
With context:
Q: "How about Tokyo?"
Vaani: "In Tokyo it's 28°C with clouds"
Q: "Will it rain?"
Vaani: "The forecast shows 20% chance of rain in Tokyo"
Entity Recognition and Tracking#
Entity Types
Vaani tracks different entity types:
PERSON: Names and references
"Who is Stephen Hawking?" → Entity: Stephen Hawking
LOCATION: Places
"What's in Paris?" → Entity: Paris
ORGANIZATION: Companies and institutions
"Tell me about Google" → Entity: Google
DATE/TIME: When something happens
"What's tomorrow's weather?" → Entity: tomorrow
TOPIC: Subject of discussion
"Tell me about Python" → Entity: Python
Pronoun Resolution
Vaani resolves pronouns using context:
Turn 1:
User: "Who's Albert Einstein?"
Vaani: "Albert Einstein was a theoretical physicist..."
[Entity added: Albert Einstein]
Turn 2:
User: "When was he born?"
Vaani: Sees "he" refers to Albert Einstein (from context)
Vaani: "Albert Einstein was born on March 14, 1879"
Reference Resolution
Vaani understands indirect references:
Turn 1:
User: "What's Python?"
Vaani: "Python is a programming language..."
[Entity added: Python, type: PROGRAMMING_LANGUAGE]
Turn 2:
User: "How do I learn it?"
Vaani: Sees "it" refers to Python
Vaani: "You can learn Python through..."
Conversation Flow#
Natural Continuation
Context enables natural conversation flow:
User: "Tell me about the solar system"
Vaani: "The solar system consists of the Sun and 8 planets..."
User: "How far is Earth from the Sun?"
Vaani: (Context: discussing solar system, Earth mentioned)
"Earth is about 93 million miles from the Sun"
User: "What about Mars?"
Vaani: (Context: discussing distances in solar system)
"Mars is about 142 million miles from the Sun"
Topic Switching
When users switch topics, context updates:
[Discussing solar system...]
User: "Actually, tell me about dinosaurs"
Vaani: (Topic change detected)
"Dinosaurs were remarkable prehistoric creatures..."
[Context shifted: solar system → dinosaurs]
User: "When did they go extinct?"
Vaani: (Context: dinosaur discussion)
"Dinosaurs went extinct about 66 million years ago"
Memory Limits#
Size Limitations
Context is limited to prevent:
Performance Degradation - Large context slows API calls
Cost Increase - More context = more tokens = higher cost
Confusion - Irrelevant old context interfering
Default Limits
# Maximum messages to keep in active context
MAX_CONTEXT_MESSAGES = 50
# Maximum total context size
MAX_CONTEXT_SIZE = 8000 tokens (approximately 2000 words)
# Time limit on old messages
CONTEXT_TIMEOUT = 30 minutes
What Gets Dropped
When limits reached:
Oldest messages are dropped first
Important entities are retained longer
Recent messages are always kept
User can explicitly clear memory if needed
Configure Memory Limits
# .env configuration
echo "MAX_CONTEXT_MESSAGES=100" >> .env
echo "MAX_CONTEXT_SIZE=16000" >> .env
echo "CONTEXT_TIMEOUT=60" >> .env # 1 hour
Context Reset#
When Context Is Cleared
Context resets when:
Vaani Restarts - Session ends - Process terminates - System reboots
Explicit Reset - User says “Clear memory” or “Forget everything” - Vaani detects new session start - System is shut down
Timeout (if configured) - 30+ minutes of inactivity - Vaani automatically clears old context
Clearing Memory Manually
# Will clear at next restart
rm -f vaani_assistant/memory.db
# Or tell Vaani
# "Clear your memory" or "Forget everything"
After Memory Clear
Turn 1 (Before clear):
User: "My name is Alice"
Vaani: "Nice to meet you, Alice"
Turn 2 (After clear/restart):
User: "Do you remember my name?"
Vaani: "I'm sorry, I don't have memory of previous sessions.
Could you remind me?"
Context in Multi-Language#
Language Tracking
Context includes language information:
Message 1: English
"What's the weather?"
Message 2: Spanish
"¿Cómo estás?" (How are you?)
Context: Detects language switch
Vaani: Responds in Spanish
Code-Switching
Vaani handles code-switching (mixing languages):
User: "Hola, what's the weather?" (Spanish + English mix)
Vaani: (Detects mix)
Responds: "The weather is... hace buen tiempo" (mixed response)
Conversation State#
State Tracking
Vaani tracks conversation state:
state = {
"topic": "weather",
"location": "New York",
"entities": ["New York", "weather", "tomorrow"],
"sentiment": "neutral",
"urgency": "low",
"user_satisfaction": None,
"conversation_length": 3,
"last_search_time": "2024-01-15 14:05:23"
}
State-Based Responses
Vaani adjusts based on state:
Low conversation_length (early in chat):
Vaani: "What would you like to know about?"
High conversation_length (long chat):
Vaani: "We've been chatting about this for a while.
Anything else you'd like to know?"
Negative sentiment:
Vaani: Uses more apologetic/helpful tone
High urgency (user seems in hurry):
Vaani: Provides shorter, quicker responses
Technical Implementation#
Memory Storage
Vaani stores memory in RAM (fast, temporary):
class Memory:
def __init__(self):
self.messages = [] # List of all messages
self.entities = {} # Map of entities to references
self.state = {} # Current conversation state
self.created_at = datetime.now()
def add_message(self, user_input, vaani_response):
self.messages.append({
"user": user_input,
"vaani": vaani_response,
"timestamp": datetime.now()
})
self._update_entities(user_input)
def get_context(self, max_messages=50):
# Return recent context for API
return self.messages[-max_messages:]
Context Formatting
Memory is formatted for API calls:
[Assistant]: Hi, I'm Vaani, how can I help?
[User]: What's the weather in New York?
[Assistant]: It's 72°F and sunny
[User]: How about Tokyo?
[Assistant]: In Tokyo it's 28°C with clouds
This is sent to Gemini API to understand context.
Entity Indexing
Entities are indexed for quick lookup:
entities = {
"New York": {"type": "LOCATION", "mentions": 2, "first_mentioned": 14000},
"Tokyo": {"type": "LOCATION", "mentions": 1, "first_mentioned": 14012},
"weather": {"type": "INTENT", "mentions": 3}
}
Advanced Features#
Conversation Summarization
For long conversations, Vaani can summarize:
Last 50 messages summarized to:
"User asked about weather in multiple cities (New York, Tokyo,
Paris). Current topic is weather forecasting for these locations."
Relevant Context Selection
Vaani selects only relevant context:
Q: "Play music"
Uses context: [previous music requests]
Ignores context: [weather discussion from 20 minutes ago]
Entity Linking
Vaani links similar entities:
"Albert" + "Albert Einstein" → Same entity
"NYC" + "New York" → Same entity
"States" + "United States" → Same entity
Debugging Memory Issues#
View Current Context
# Enable debug logging
LOG_LEVEL=DEBUG python3 main.py
# Check memory in logs
tail -100 logs/error.log | grep -i memory
Clear Memory
# Force clear memory at startup
rm -f vaani_assistant/memory_*.json
# Then restart
python3 main.py
Check Memory Size
# See how many messages are stored
python3 << 'EOF'
from vaani_assistant.core.memory import get_memory
memory = get_memory()
print(f"Messages in memory: {len(memory.messages)}")
print(f"Entities tracked: {len(memory.entities)}")
EOF
Memory Statistics
python3 << 'EOF'
from vaani_assistant.core.memory import get_memory
memory = get_memory()
total_size = sum(len(str(m)) for m in memory.messages)
print(f"Total context size: {total_size} characters")
print(f"Average message size: {total_size / len(memory.messages)}")
print(f"Oldest message: {memory.messages[0]['timestamp']}")
print(f"Newest message: {memory.messages[-1]['timestamp']}")
EOF
Limitations#
What Memory Cannot Do
Persist - Resets when Vaani restarts
Learn - Doesn’t improve understanding over sessions
Share - Doesn’t sync across devices
Disambiguate - Struggles with truly ambiguous references
Forget Selectively - All-or-nothing clearing
Why These Limitations Exist
Privacy - Don’t store data long-term
Simplicity - Simpler architecture to maintain
Cost - Persistent storage costs money
Complexity - Cross-device sync is complicated
Safety - Limited memory reduces error propagation
Best Practices#
For Users
Be Specific - Use full names, places, not pronouns
Provide Context - Explain connections between queries
Start Fresh - Clear memory between unrelated topics
Check Understanding - Ask Vaani to confirm understanding
For Developers
Test Context - Test multi-turn conversations
Monitor Size - Keep memory within limits
Handle Edge Cases - Ambiguous references, code-switching
Log Wisely - Don’t log sensitive memory content
Configuration#
Memory Behavior Configuration
# How many messages to keep
echo "MEMORY_MAX_MESSAGES=50" >> .env
# Maximum tokens in context
echo "MEMORY_MAX_TOKENS=8000" >> .env
# Clear memory after this many minutes idle
echo "MEMORY_IDLE_TIMEOUT=30" >> .env
# Enable memory compression (summarization)
echo "MEMORY_COMPRESSION_ENABLED=true" >> .env
# Save memory to disk (experimental)
echo "MEMORY_PERSISTENCE_ENABLED=false" >> .env
Performance Impact#
Memory Overhead
Current implementation:
RAM Usage - ~1MB per 100 messages
API Cost - ~200 tokens per message in context
Response Time - +100-200ms for large context
Optimization
For slow systems:
# Reduce context size
echo "MEMORY_MAX_MESSAGES=20" >> .env
# Enable compression
echo "MEMORY_COMPRESSION_ENABLED=true" >> .env
# Disable entity tracking
echo "ENTITY_TRACKING_ENABLED=false" >> .env
Next Steps#
See How Vaani Thinks for AI response details
Read customization for memory tuning
Check project_structure for code implementation
Review Troubleshooting for memory issues