first commit
This commit is contained in:
264
agent-livekit/REALTIME_FORM_DISCOVERY.md
Normal file
264
agent-livekit/REALTIME_FORM_DISCOVERY.md
Normal file
@@ -0,0 +1,264 @@
|
||||
# Real-Time Form Discovery System
|
||||
|
||||
## Overview
|
||||
|
||||
The LiveKit agent now features a **REAL-TIME ONLY** form discovery system that **NEVER uses cached selectors**. Every form field discovery is performed live using MCP tools, ensuring the most current and accurate form element detection.
|
||||
|
||||
## Key Principles
|
||||
|
||||
### 🚫 NO CACHE POLICY
|
||||
- **Zero cached selectors** - every request gets fresh selectors
|
||||
- **Real-time discovery only** - uses MCP tools on every call
|
||||
- **No hardcoded selectors** - all elements discovered dynamically
|
||||
- **Fresh page analysis** - adapts to dynamic content changes
|
||||
|
||||
### 🔄 Real-Time MCP Tools
|
||||
- **chrome_get_interactive_elements** - Gets current form elements
|
||||
- **chrome_get_content_web_form** - Analyzes form structure
|
||||
- **chrome_get_web_content** - Content analysis for field discovery
|
||||
- **Live selector testing** - Validates selectors before use
|
||||
|
||||
## How Real-Time Discovery Works
|
||||
|
||||
### 1. Voice Command Processing
|
||||
|
||||
When a user says: `"fill email with john@example.com"`
|
||||
|
||||
```python
|
||||
# NO cache lookup - goes straight to real-time discovery
|
||||
field_name = "email"
|
||||
value = "john@example.com"
|
||||
|
||||
# Step 1: Real-time MCP discovery
|
||||
discovery_result = await client._discover_form_fields_dynamically(field_name, value)
|
||||
|
||||
# Step 2: Enhanced detection with retry (if needed)
|
||||
enhanced_result = await client._enhanced_field_detection_with_retry(field_name, value)
|
||||
|
||||
# Step 3: Direct MCP element search (final fallback)
|
||||
direct_result = await client._direct_mcp_element_search(field_name, value)
|
||||
```
|
||||
|
||||
### 2. Real-Time Discovery Process
|
||||
|
||||
#### Strategy 1: Interactive Elements Discovery
|
||||
```python
|
||||
# Get ALL current interactive elements
|
||||
interactive_result = await client._call_mcp_tool("chrome_get_interactive_elements", {
|
||||
"types": ["input", "textarea", "select"]
|
||||
})
|
||||
|
||||
# Match field name to current elements
|
||||
for element in elements:
|
||||
if client._is_field_match(element, field_name):
|
||||
selector = client._extract_best_selector(element)
|
||||
# Try to fill immediately with fresh selector
|
||||
```
|
||||
|
||||
#### Strategy 2: Form Content Analysis
|
||||
```python
|
||||
# Get current form structure
|
||||
form_result = await client._call_mcp_tool("chrome_get_content_web_form", {})
|
||||
|
||||
# Parse form content for field patterns
|
||||
selector = client._parse_form_content_for_field(form_content, field_name)
|
||||
|
||||
# Test and use selector immediately
|
||||
```
|
||||
|
||||
#### Strategy 3: Direct Element Search
|
||||
```python
|
||||
# Exhaustive search through ALL elements
|
||||
all_elements = await client._call_mcp_tool("chrome_get_interactive_elements", {})
|
||||
|
||||
# Very flexible matching for any possible match
|
||||
for element in all_elements:
|
||||
if client._is_very_flexible_match(element, field_name):
|
||||
# Generate and test selector immediately
|
||||
```
|
||||
|
||||
### 3. Real-Time Selector Generation
|
||||
|
||||
The system generates selectors in real-time based on current element attributes:
|
||||
|
||||
```python
|
||||
def _extract_best_selector(element):
|
||||
attrs = element.get("attributes", {})
|
||||
|
||||
# Priority order for reliability
|
||||
if attrs.get("id"):
|
||||
return f"#{attrs['id']}"
|
||||
if attrs.get("name"):
|
||||
return f"input[name='{attrs['name']}']"
|
||||
if attrs.get("type") and attrs.get("name"):
|
||||
return f"input[type='{attrs['type']}'][name='{attrs['name']}']"
|
||||
# ... more patterns
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### Real-Time Functions
|
||||
|
||||
#### `fill_field_by_name(field_name: str, value: str) -> str`
|
||||
**NOW REAL-TIME ONLY** - No cache, fresh discovery every call.
|
||||
|
||||
#### `fill_field_realtime_only(field_name: str, value: str) -> str`
|
||||
**Guaranteed real-time** - Explicit real-time discovery function.
|
||||
|
||||
#### `get_realtime_form_fields() -> str`
|
||||
**Live form discovery** - Gets current form fields using only MCP tools.
|
||||
|
||||
#### `_discover_form_fields_dynamically(field_name: str, value: str) -> dict`
|
||||
**Pure real-time discovery** - Uses chrome_get_interactive_elements and chrome_get_content_web_form.
|
||||
|
||||
#### `_direct_mcp_element_search(field_name: str, value: str) -> dict`
|
||||
**Exhaustive real-time search** - Final fallback using comprehensive MCP element search.
|
||||
|
||||
### Real-Time Matching Algorithms
|
||||
|
||||
#### `_is_field_match(element: dict, field_name: str) -> bool`
|
||||
Standard real-time field matching using current element attributes.
|
||||
|
||||
#### `_is_very_flexible_match(element: dict, field_name: str) -> bool`
|
||||
Very flexible real-time matching for challenging cases.
|
||||
|
||||
#### `_generate_common_selectors(field_name: str) -> list`
|
||||
Generates common CSS selectors based on field name patterns.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Voice Commands (All Real-Time)
|
||||
```
|
||||
User: "fill email with john@example.com"
|
||||
Agent: [Uses chrome_get_interactive_elements] ✓ Filled 'email' field using real-time discovery
|
||||
|
||||
User: "enter password secret123"
|
||||
Agent: [Uses chrome_get_content_web_form] ✓ Filled 'password' field using form content analysis
|
||||
|
||||
User: "type hello in search box"
|
||||
Agent: [Uses direct MCP search] ✓ Filled 'search' field using exhaustive element search
|
||||
```
|
||||
|
||||
### Programmatic Usage
|
||||
```python
|
||||
# All these functions use ONLY real-time discovery
|
||||
result = await client.fill_field_by_name("email", "user@example.com")
|
||||
result = await client.fill_field_realtime_only("search", "python")
|
||||
result = await client._discover_form_fields_dynamically("username", "john_doe")
|
||||
```
|
||||
|
||||
## Real-Time Discovery Strategies
|
||||
|
||||
### 1. Interactive Elements Strategy
|
||||
- Uses `chrome_get_interactive_elements` to get current form elements
|
||||
- Matches field names to element attributes in real-time
|
||||
- Tests selectors immediately before use
|
||||
|
||||
### 2. Form Content Strategy
|
||||
- Uses `chrome_get_content_web_form` for form-specific analysis
|
||||
- Parses current form structure for field patterns
|
||||
- Generates selectors based on live content
|
||||
|
||||
### 3. Direct Search Strategy
|
||||
- Exhaustive search through ALL current page elements
|
||||
- Very flexible matching criteria
|
||||
- Tests multiple selector patterns
|
||||
|
||||
### 4. Common Selector Strategy
|
||||
- Generates intelligent selectors based on field name
|
||||
- Tests each selector against current page
|
||||
- Uses type-specific patterns for common fields
|
||||
|
||||
## Benefits of Real-Time Discovery
|
||||
|
||||
### 🎯 Accuracy
|
||||
- **Always current** - reflects actual page state
|
||||
- **No stale selectors** - eliminates cached selector failures
|
||||
- **Dynamic adaptation** - handles page changes automatically
|
||||
|
||||
### 🔄 Reliability
|
||||
- **Fresh discovery** - every request gets new selectors
|
||||
- **Multiple strategies** - comprehensive fallback methods
|
||||
- **Live validation** - selectors tested before use
|
||||
|
||||
### 🌐 Compatibility
|
||||
- **Works on any site** - no pre-configuration needed
|
||||
- **Handles dynamic content** - adapts to JavaScript-generated forms
|
||||
- **Cross-platform** - works with any web technology
|
||||
|
||||
### 🛠️ Maintainability
|
||||
- **Zero maintenance** - no selector databases to update
|
||||
- **Self-adapting** - automatically handles site changes
|
||||
- **Future-proof** - works with new web technologies
|
||||
|
||||
## Testing Real-Time Discovery
|
||||
|
||||
Run the real-time test suite:
|
||||
|
||||
```bash
|
||||
python test_realtime_form_discovery.py
|
||||
```
|
||||
|
||||
This tests:
|
||||
- Real-time discovery on Google search
|
||||
- Form field discovery on GitHub
|
||||
- Direct MCP element search
|
||||
- Very flexible matching algorithms
|
||||
- Cross-website compatibility
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Real-Time vs Speed
|
||||
- **Slightly slower** than cached selectors (by design)
|
||||
- **More reliable** than cached approaches
|
||||
- **Eliminates cache invalidation** issues
|
||||
- **Prevents stale selector errors**
|
||||
|
||||
### Optimization Strategies
|
||||
- **Parallel discovery** - multiple strategies run concurrently
|
||||
- **Early termination** - stops on first successful match
|
||||
- **Intelligent prioritization** - most likely selectors first
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Graceful Degradation
|
||||
1. **Interactive elements** → **Form content** → **Direct search** → **Common selectors**
|
||||
2. **Detailed logging** of each attempt
|
||||
3. **Clear error messages** about what was tried
|
||||
4. **No silent failures** - always reports what happened
|
||||
|
||||
### Retry Mechanism
|
||||
- **Multiple attempts** with increasing flexibility
|
||||
- **Different strategies** on each retry
|
||||
- **Configurable retry count** (default: 3)
|
||||
- **Delay between retries** to handle loading
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Advanced Real-Time Features
|
||||
- **Visual element detection** using screenshots
|
||||
- **Machine learning** field recognition
|
||||
- **Context-aware** field relationships
|
||||
- **Performance optimization** for faster discovery
|
||||
|
||||
### Real-Time Analytics
|
||||
- **Discovery success rates** by strategy
|
||||
- **Performance metrics** for each method
|
||||
- **Field matching accuracy** tracking
|
||||
- **Site compatibility** reporting
|
||||
|
||||
## Migration from Cached System
|
||||
|
||||
### Automatic Migration
|
||||
- **No code changes** required for existing voice commands
|
||||
- **Backward compatibility** maintained
|
||||
- **Enhanced reliability** with real-time discovery
|
||||
- **Same API** with improved implementation
|
||||
|
||||
### Benefits of Migration
|
||||
- **Eliminates cache issues** - no more stale selectors
|
||||
- **Improves accuracy** - always uses current page state
|
||||
- **Reduces maintenance** - no cache management needed
|
||||
- **Increases reliability** - works on dynamic sites
|
||||
|
||||
The real-time discovery system ensures that the LiveKit agent always works with the most current page state, providing maximum reliability and compatibility across all websites.
|
Reference in New Issue
Block a user