8.9 KiB
Real-Time Form Discovery System
Overview
The LiveKit agent now features a REAL-TIME ONLY form discovery system that NEVER uses cached selectors. Every form field discovery is performed live using MCP tools, ensuring the most current and accurate form element detection.
Key Principles
🚫 NO CACHE POLICY
- Zero cached selectors - every request gets fresh selectors
- Real-time discovery only - uses MCP tools on every call
- No hardcoded selectors - all elements discovered dynamically
- Fresh page analysis - adapts to dynamic content changes
🔄 Real-Time MCP Tools
- chrome_get_interactive_elements - Gets current form elements
- chrome_get_content_web_form - Analyzes form structure
- chrome_get_web_content - Content analysis for field discovery
- Live selector testing - Validates selectors before use
How Real-Time Discovery Works
1. Voice Command Processing
When a user says: "fill email with john@example.com"
# NO cache lookup - goes straight to real-time discovery
field_name = "email"
value = "john@example.com"
# Step 1: Real-time MCP discovery
discovery_result = await client._discover_form_fields_dynamically(field_name, value)
# Step 2: Enhanced detection with retry (if needed)
enhanced_result = await client._enhanced_field_detection_with_retry(field_name, value)
# Step 3: Direct MCP element search (final fallback)
direct_result = await client._direct_mcp_element_search(field_name, value)
2. Real-Time Discovery Process
Strategy 1: Interactive Elements Discovery
# Get ALL current interactive elements
interactive_result = await client._call_mcp_tool("chrome_get_interactive_elements", {
"types": ["input", "textarea", "select"]
})
# Match field name to current elements
for element in elements:
if client._is_field_match(element, field_name):
selector = client._extract_best_selector(element)
# Try to fill immediately with fresh selector
Strategy 2: Form Content Analysis
# Get current form structure
form_result = await client._call_mcp_tool("chrome_get_content_web_form", {})
# Parse form content for field patterns
selector = client._parse_form_content_for_field(form_content, field_name)
# Test and use selector immediately
Strategy 3: Direct Element Search
# Exhaustive search through ALL elements
all_elements = await client._call_mcp_tool("chrome_get_interactive_elements", {})
# Very flexible matching for any possible match
for element in all_elements:
if client._is_very_flexible_match(element, field_name):
# Generate and test selector immediately
3. Real-Time Selector Generation
The system generates selectors in real-time based on current element attributes:
def _extract_best_selector(element):
attrs = element.get("attributes", {})
# Priority order for reliability
if attrs.get("id"):
return f"#{attrs['id']}"
if attrs.get("name"):
return f"input[name='{attrs['name']}']"
if attrs.get("type") and attrs.get("name"):
return f"input[type='{attrs['type']}'][name='{attrs['name']}']"
# ... more patterns
API Reference
Real-Time Functions
fill_field_by_name(field_name: str, value: str) -> str
NOW REAL-TIME ONLY - No cache, fresh discovery every call.
fill_field_realtime_only(field_name: str, value: str) -> str
Guaranteed real-time - Explicit real-time discovery function.
get_realtime_form_fields() -> str
Live form discovery - Gets current form fields using only MCP tools.
_discover_form_fields_dynamically(field_name: str, value: str) -> dict
Pure real-time discovery - Uses chrome_get_interactive_elements and chrome_get_content_web_form.
_direct_mcp_element_search(field_name: str, value: str) -> dict
Exhaustive real-time search - Final fallback using comprehensive MCP element search.
Real-Time Matching Algorithms
_is_field_match(element: dict, field_name: str) -> bool
Standard real-time field matching using current element attributes.
_is_very_flexible_match(element: dict, field_name: str) -> bool
Very flexible real-time matching for challenging cases.
_generate_common_selectors(field_name: str) -> list
Generates common CSS selectors based on field name patterns.
Usage Examples
Voice Commands (All Real-Time)
User: "fill email with john@example.com"
Agent: [Uses chrome_get_interactive_elements] ✓ Filled 'email' field using real-time discovery
User: "enter password secret123"
Agent: [Uses chrome_get_content_web_form] ✓ Filled 'password' field using form content analysis
User: "type hello in search box"
Agent: [Uses direct MCP search] ✓ Filled 'search' field using exhaustive element search
Programmatic Usage
# All these functions use ONLY real-time discovery
result = await client.fill_field_by_name("email", "user@example.com")
result = await client.fill_field_realtime_only("search", "python")
result = await client._discover_form_fields_dynamically("username", "john_doe")
Real-Time Discovery Strategies
1. Interactive Elements Strategy
- Uses
chrome_get_interactive_elements
to get current form elements - Matches field names to element attributes in real-time
- Tests selectors immediately before use
2. Form Content Strategy
- Uses
chrome_get_content_web_form
for form-specific analysis - Parses current form structure for field patterns
- Generates selectors based on live content
3. Direct Search Strategy
- Exhaustive search through ALL current page elements
- Very flexible matching criteria
- Tests multiple selector patterns
4. Common Selector Strategy
- Generates intelligent selectors based on field name
- Tests each selector against current page
- Uses type-specific patterns for common fields
Benefits of Real-Time Discovery
🎯 Accuracy
- Always current - reflects actual page state
- No stale selectors - eliminates cached selector failures
- Dynamic adaptation - handles page changes automatically
🔄 Reliability
- Fresh discovery - every request gets new selectors
- Multiple strategies - comprehensive fallback methods
- Live validation - selectors tested before use
🌐 Compatibility
- Works on any site - no pre-configuration needed
- Handles dynamic content - adapts to JavaScript-generated forms
- Cross-platform - works with any web technology
🛠️ Maintainability
- Zero maintenance - no selector databases to update
- Self-adapting - automatically handles site changes
- Future-proof - works with new web technologies
Testing Real-Time Discovery
Run the real-time test suite:
python test_realtime_form_discovery.py
This tests:
- Real-time discovery on Google search
- Form field discovery on GitHub
- Direct MCP element search
- Very flexible matching algorithms
- Cross-website compatibility
Performance Considerations
Real-Time vs Speed
- Slightly slower than cached selectors (by design)
- More reliable than cached approaches
- Eliminates cache invalidation issues
- Prevents stale selector errors
Optimization Strategies
- Parallel discovery - multiple strategies run concurrently
- Early termination - stops on first successful match
- Intelligent prioritization - most likely selectors first
Error Handling
Graceful Degradation
- Interactive elements → Form content → Direct search → Common selectors
- Detailed logging of each attempt
- Clear error messages about what was tried
- No silent failures - always reports what happened
Retry Mechanism
- Multiple attempts with increasing flexibility
- Different strategies on each retry
- Configurable retry count (default: 3)
- Delay between retries to handle loading
Future Enhancements
Advanced Real-Time Features
- Visual element detection using screenshots
- Machine learning field recognition
- Context-aware field relationships
- Performance optimization for faster discovery
Real-Time Analytics
- Discovery success rates by strategy
- Performance metrics for each method
- Field matching accuracy tracking
- Site compatibility reporting
Migration from Cached System
Automatic Migration
- No code changes required for existing voice commands
- Backward compatibility maintained
- Enhanced reliability with real-time discovery
- Same API with improved implementation
Benefits of Migration
- Eliminates cache issues - no more stale selectors
- Improves accuracy - always uses current page state
- Reduces maintenance - no cache management needed
- Increases reliability - works on dynamic sites
The real-time discovery system ensures that the LiveKit agent always works with the most current page state, providing maximum reliability and compatibility across all websites.