265 lines
8.9 KiB
Markdown
265 lines
8.9 KiB
Markdown
# Real-Time Form Discovery System
|
|
|
|
## Overview
|
|
|
|
The LiveKit agent now features a **REAL-TIME ONLY** form discovery system that **NEVER uses cached selectors**. Every form field discovery is performed live using MCP tools, ensuring the most current and accurate form element detection.
|
|
|
|
## Key Principles
|
|
|
|
### 🚫 NO CACHE POLICY
|
|
- **Zero cached selectors** - every request gets fresh selectors
|
|
- **Real-time discovery only** - uses MCP tools on every call
|
|
- **No hardcoded selectors** - all elements discovered dynamically
|
|
- **Fresh page analysis** - adapts to dynamic content changes
|
|
|
|
### 🔄 Real-Time MCP Tools
|
|
- **chrome_get_interactive_elements** - Gets current form elements
|
|
- **chrome_get_content_web_form** - Analyzes form structure
|
|
- **chrome_get_web_content** - Content analysis for field discovery
|
|
- **Live selector testing** - Validates selectors before use
|
|
|
|
## How Real-Time Discovery Works
|
|
|
|
### 1. Voice Command Processing
|
|
|
|
When a user says: `"fill email with john@example.com"`
|
|
|
|
```python
|
|
# NO cache lookup - goes straight to real-time discovery
|
|
field_name = "email"
|
|
value = "john@example.com"
|
|
|
|
# Step 1: Real-time MCP discovery
|
|
discovery_result = await client._discover_form_fields_dynamically(field_name, value)
|
|
|
|
# Step 2: Enhanced detection with retry (if needed)
|
|
enhanced_result = await client._enhanced_field_detection_with_retry(field_name, value)
|
|
|
|
# Step 3: Direct MCP element search (final fallback)
|
|
direct_result = await client._direct_mcp_element_search(field_name, value)
|
|
```
|
|
|
|
### 2. Real-Time Discovery Process
|
|
|
|
#### Strategy 1: Interactive Elements Discovery
|
|
```python
|
|
# Get ALL current interactive elements
|
|
interactive_result = await client._call_mcp_tool("chrome_get_interactive_elements", {
|
|
"types": ["input", "textarea", "select"]
|
|
})
|
|
|
|
# Match field name to current elements
|
|
for element in elements:
|
|
if client._is_field_match(element, field_name):
|
|
selector = client._extract_best_selector(element)
|
|
# Try to fill immediately with fresh selector
|
|
```
|
|
|
|
#### Strategy 2: Form Content Analysis
|
|
```python
|
|
# Get current form structure
|
|
form_result = await client._call_mcp_tool("chrome_get_content_web_form", {})
|
|
|
|
# Parse form content for field patterns
|
|
selector = client._parse_form_content_for_field(form_content, field_name)
|
|
|
|
# Test and use selector immediately
|
|
```
|
|
|
|
#### Strategy 3: Direct Element Search
|
|
```python
|
|
# Exhaustive search through ALL elements
|
|
all_elements = await client._call_mcp_tool("chrome_get_interactive_elements", {})
|
|
|
|
# Very flexible matching for any possible match
|
|
for element in all_elements:
|
|
if client._is_very_flexible_match(element, field_name):
|
|
# Generate and test selector immediately
|
|
```
|
|
|
|
### 3. Real-Time Selector Generation
|
|
|
|
The system generates selectors in real-time based on current element attributes:
|
|
|
|
```python
|
|
def _extract_best_selector(element):
|
|
attrs = element.get("attributes", {})
|
|
|
|
# Priority order for reliability
|
|
if attrs.get("id"):
|
|
return f"#{attrs['id']}"
|
|
if attrs.get("name"):
|
|
return f"input[name='{attrs['name']}']"
|
|
if attrs.get("type") and attrs.get("name"):
|
|
return f"input[type='{attrs['type']}'][name='{attrs['name']}']"
|
|
# ... more patterns
|
|
```
|
|
|
|
## API Reference
|
|
|
|
### Real-Time Functions
|
|
|
|
#### `fill_field_by_name(field_name: str, value: str) -> str`
|
|
**NOW REAL-TIME ONLY** - No cache, fresh discovery every call.
|
|
|
|
#### `fill_field_realtime_only(field_name: str, value: str) -> str`
|
|
**Guaranteed real-time** - Explicit real-time discovery function.
|
|
|
|
#### `get_realtime_form_fields() -> str`
|
|
**Live form discovery** - Gets current form fields using only MCP tools.
|
|
|
|
#### `_discover_form_fields_dynamically(field_name: str, value: str) -> dict`
|
|
**Pure real-time discovery** - Uses chrome_get_interactive_elements and chrome_get_content_web_form.
|
|
|
|
#### `_direct_mcp_element_search(field_name: str, value: str) -> dict`
|
|
**Exhaustive real-time search** - Final fallback using comprehensive MCP element search.
|
|
|
|
### Real-Time Matching Algorithms
|
|
|
|
#### `_is_field_match(element: dict, field_name: str) -> bool`
|
|
Standard real-time field matching using current element attributes.
|
|
|
|
#### `_is_very_flexible_match(element: dict, field_name: str) -> bool`
|
|
Very flexible real-time matching for challenging cases.
|
|
|
|
#### `_generate_common_selectors(field_name: str) -> list`
|
|
Generates common CSS selectors based on field name patterns.
|
|
|
|
## Usage Examples
|
|
|
|
### Voice Commands (All Real-Time)
|
|
```
|
|
User: "fill email with john@example.com"
|
|
Agent: [Uses chrome_get_interactive_elements] ✓ Filled 'email' field using real-time discovery
|
|
|
|
User: "enter password secret123"
|
|
Agent: [Uses chrome_get_content_web_form] ✓ Filled 'password' field using form content analysis
|
|
|
|
User: "type hello in search box"
|
|
Agent: [Uses direct MCP search] ✓ Filled 'search' field using exhaustive element search
|
|
```
|
|
|
|
### Programmatic Usage
|
|
```python
|
|
# All these functions use ONLY real-time discovery
|
|
result = await client.fill_field_by_name("email", "user@example.com")
|
|
result = await client.fill_field_realtime_only("search", "python")
|
|
result = await client._discover_form_fields_dynamically("username", "john_doe")
|
|
```
|
|
|
|
## Real-Time Discovery Strategies
|
|
|
|
### 1. Interactive Elements Strategy
|
|
- Uses `chrome_get_interactive_elements` to get current form elements
|
|
- Matches field names to element attributes in real-time
|
|
- Tests selectors immediately before use
|
|
|
|
### 2. Form Content Strategy
|
|
- Uses `chrome_get_content_web_form` for form-specific analysis
|
|
- Parses current form structure for field patterns
|
|
- Generates selectors based on live content
|
|
|
|
### 3. Direct Search Strategy
|
|
- Exhaustive search through ALL current page elements
|
|
- Very flexible matching criteria
|
|
- Tests multiple selector patterns
|
|
|
|
### 4. Common Selector Strategy
|
|
- Generates intelligent selectors based on field name
|
|
- Tests each selector against current page
|
|
- Uses type-specific patterns for common fields
|
|
|
|
## Benefits of Real-Time Discovery
|
|
|
|
### 🎯 Accuracy
|
|
- **Always current** - reflects actual page state
|
|
- **No stale selectors** - eliminates cached selector failures
|
|
- **Dynamic adaptation** - handles page changes automatically
|
|
|
|
### 🔄 Reliability
|
|
- **Fresh discovery** - every request gets new selectors
|
|
- **Multiple strategies** - comprehensive fallback methods
|
|
- **Live validation** - selectors tested before use
|
|
|
|
### 🌐 Compatibility
|
|
- **Works on any site** - no pre-configuration needed
|
|
- **Handles dynamic content** - adapts to JavaScript-generated forms
|
|
- **Cross-platform** - works with any web technology
|
|
|
|
### 🛠️ Maintainability
|
|
- **Zero maintenance** - no selector databases to update
|
|
- **Self-adapting** - automatically handles site changes
|
|
- **Future-proof** - works with new web technologies
|
|
|
|
## Testing Real-Time Discovery
|
|
|
|
Run the real-time test suite:
|
|
|
|
```bash
|
|
python test_realtime_form_discovery.py
|
|
```
|
|
|
|
This tests:
|
|
- Real-time discovery on Google search
|
|
- Form field discovery on GitHub
|
|
- Direct MCP element search
|
|
- Very flexible matching algorithms
|
|
- Cross-website compatibility
|
|
|
|
## Performance Considerations
|
|
|
|
### Real-Time vs Speed
|
|
- **Slightly slower** than cached selectors (by design)
|
|
- **More reliable** than cached approaches
|
|
- **Eliminates cache invalidation** issues
|
|
- **Prevents stale selector errors**
|
|
|
|
### Optimization Strategies
|
|
- **Parallel discovery** - multiple strategies run concurrently
|
|
- **Early termination** - stops on first successful match
|
|
- **Intelligent prioritization** - most likely selectors first
|
|
|
|
## Error Handling
|
|
|
|
### Graceful Degradation
|
|
1. **Interactive elements** → **Form content** → **Direct search** → **Common selectors**
|
|
2. **Detailed logging** of each attempt
|
|
3. **Clear error messages** about what was tried
|
|
4. **No silent failures** - always reports what happened
|
|
|
|
### Retry Mechanism
|
|
- **Multiple attempts** with increasing flexibility
|
|
- **Different strategies** on each retry
|
|
- **Configurable retry count** (default: 3)
|
|
- **Delay between retries** to handle loading
|
|
|
|
## Future Enhancements
|
|
|
|
### Advanced Real-Time Features
|
|
- **Visual element detection** using screenshots
|
|
- **Machine learning** field recognition
|
|
- **Context-aware** field relationships
|
|
- **Performance optimization** for faster discovery
|
|
|
|
### Real-Time Analytics
|
|
- **Discovery success rates** by strategy
|
|
- **Performance metrics** for each method
|
|
- **Field matching accuracy** tracking
|
|
- **Site compatibility** reporting
|
|
|
|
## Migration from Cached System
|
|
|
|
### Automatic Migration
|
|
- **No code changes** required for existing voice commands
|
|
- **Backward compatibility** maintained
|
|
- **Enhanced reliability** with real-time discovery
|
|
- **Same API** with improved implementation
|
|
|
|
### Benefits of Migration
|
|
- **Eliminates cache issues** - no more stale selectors
|
|
- **Improves accuracy** - always uses current page state
|
|
- **Reduces maintenance** - no cache management needed
|
|
- **Increases reliability** - works on dynamic sites
|
|
|
|
The real-time discovery system ensures that the LiveKit agent always works with the most current page state, providing maximum reliability and compatibility across all websites.
|