Files
broswer-automation/agent-livekit/REALTIME_FORM_DISCOVERY.md
nasir@endelospay.com d97cad1736 first commit
2025-08-12 02:54:17 +05:00

265 lines
8.9 KiB
Markdown

# Real-Time Form Discovery System
## Overview
The LiveKit agent now features a **REAL-TIME ONLY** form discovery system that **NEVER uses cached selectors**. Every form field discovery is performed live using MCP tools, ensuring the most current and accurate form element detection.
## Key Principles
### 🚫 NO CACHE POLICY
- **Zero cached selectors** - every request gets fresh selectors
- **Real-time discovery only** - uses MCP tools on every call
- **No hardcoded selectors** - all elements discovered dynamically
- **Fresh page analysis** - adapts to dynamic content changes
### 🔄 Real-Time MCP Tools
- **chrome_get_interactive_elements** - Gets current form elements
- **chrome_get_content_web_form** - Analyzes form structure
- **chrome_get_web_content** - Content analysis for field discovery
- **Live selector testing** - Validates selectors before use
## How Real-Time Discovery Works
### 1. Voice Command Processing
When a user says: `"fill email with john@example.com"`
```python
# NO cache lookup - goes straight to real-time discovery
field_name = "email"
value = "john@example.com"
# Step 1: Real-time MCP discovery
discovery_result = await client._discover_form_fields_dynamically(field_name, value)
# Step 2: Enhanced detection with retry (if needed)
enhanced_result = await client._enhanced_field_detection_with_retry(field_name, value)
# Step 3: Direct MCP element search (final fallback)
direct_result = await client._direct_mcp_element_search(field_name, value)
```
### 2. Real-Time Discovery Process
#### Strategy 1: Interactive Elements Discovery
```python
# Get ALL current interactive elements
interactive_result = await client._call_mcp_tool("chrome_get_interactive_elements", {
"types": ["input", "textarea", "select"]
})
# Match field name to current elements
for element in elements:
if client._is_field_match(element, field_name):
selector = client._extract_best_selector(element)
# Try to fill immediately with fresh selector
```
#### Strategy 2: Form Content Analysis
```python
# Get current form structure
form_result = await client._call_mcp_tool("chrome_get_content_web_form", {})
# Parse form content for field patterns
selector = client._parse_form_content_for_field(form_content, field_name)
# Test and use selector immediately
```
#### Strategy 3: Direct Element Search
```python
# Exhaustive search through ALL elements
all_elements = await client._call_mcp_tool("chrome_get_interactive_elements", {})
# Very flexible matching for any possible match
for element in all_elements:
if client._is_very_flexible_match(element, field_name):
# Generate and test selector immediately
```
### 3. Real-Time Selector Generation
The system generates selectors in real-time based on current element attributes:
```python
def _extract_best_selector(element):
attrs = element.get("attributes", {})
# Priority order for reliability
if attrs.get("id"):
return f"#{attrs['id']}"
if attrs.get("name"):
return f"input[name='{attrs['name']}']"
if attrs.get("type") and attrs.get("name"):
return f"input[type='{attrs['type']}'][name='{attrs['name']}']"
# ... more patterns
```
## API Reference
### Real-Time Functions
#### `fill_field_by_name(field_name: str, value: str) -> str`
**NOW REAL-TIME ONLY** - No cache, fresh discovery every call.
#### `fill_field_realtime_only(field_name: str, value: str) -> str`
**Guaranteed real-time** - Explicit real-time discovery function.
#### `get_realtime_form_fields() -> str`
**Live form discovery** - Gets current form fields using only MCP tools.
#### `_discover_form_fields_dynamically(field_name: str, value: str) -> dict`
**Pure real-time discovery** - Uses chrome_get_interactive_elements and chrome_get_content_web_form.
#### `_direct_mcp_element_search(field_name: str, value: str) -> dict`
**Exhaustive real-time search** - Final fallback using comprehensive MCP element search.
### Real-Time Matching Algorithms
#### `_is_field_match(element: dict, field_name: str) -> bool`
Standard real-time field matching using current element attributes.
#### `_is_very_flexible_match(element: dict, field_name: str) -> bool`
Very flexible real-time matching for challenging cases.
#### `_generate_common_selectors(field_name: str) -> list`
Generates common CSS selectors based on field name patterns.
## Usage Examples
### Voice Commands (All Real-Time)
```
User: "fill email with john@example.com"
Agent: [Uses chrome_get_interactive_elements] ✓ Filled 'email' field using real-time discovery
User: "enter password secret123"
Agent: [Uses chrome_get_content_web_form] ✓ Filled 'password' field using form content analysis
User: "type hello in search box"
Agent: [Uses direct MCP search] ✓ Filled 'search' field using exhaustive element search
```
### Programmatic Usage
```python
# All these functions use ONLY real-time discovery
result = await client.fill_field_by_name("email", "user@example.com")
result = await client.fill_field_realtime_only("search", "python")
result = await client._discover_form_fields_dynamically("username", "john_doe")
```
## Real-Time Discovery Strategies
### 1. Interactive Elements Strategy
- Uses `chrome_get_interactive_elements` to get current form elements
- Matches field names to element attributes in real-time
- Tests selectors immediately before use
### 2. Form Content Strategy
- Uses `chrome_get_content_web_form` for form-specific analysis
- Parses current form structure for field patterns
- Generates selectors based on live content
### 3. Direct Search Strategy
- Exhaustive search through ALL current page elements
- Very flexible matching criteria
- Tests multiple selector patterns
### 4. Common Selector Strategy
- Generates intelligent selectors based on field name
- Tests each selector against current page
- Uses type-specific patterns for common fields
## Benefits of Real-Time Discovery
### 🎯 Accuracy
- **Always current** - reflects actual page state
- **No stale selectors** - eliminates cached selector failures
- **Dynamic adaptation** - handles page changes automatically
### 🔄 Reliability
- **Fresh discovery** - every request gets new selectors
- **Multiple strategies** - comprehensive fallback methods
- **Live validation** - selectors tested before use
### 🌐 Compatibility
- **Works on any site** - no pre-configuration needed
- **Handles dynamic content** - adapts to JavaScript-generated forms
- **Cross-platform** - works with any web technology
### 🛠️ Maintainability
- **Zero maintenance** - no selector databases to update
- **Self-adapting** - automatically handles site changes
- **Future-proof** - works with new web technologies
## Testing Real-Time Discovery
Run the real-time test suite:
```bash
python test_realtime_form_discovery.py
```
This tests:
- Real-time discovery on Google search
- Form field discovery on GitHub
- Direct MCP element search
- Very flexible matching algorithms
- Cross-website compatibility
## Performance Considerations
### Real-Time vs Speed
- **Slightly slower** than cached selectors (by design)
- **More reliable** than cached approaches
- **Eliminates cache invalidation** issues
- **Prevents stale selector errors**
### Optimization Strategies
- **Parallel discovery** - multiple strategies run concurrently
- **Early termination** - stops on first successful match
- **Intelligent prioritization** - most likely selectors first
## Error Handling
### Graceful Degradation
1. **Interactive elements****Form content****Direct search****Common selectors**
2. **Detailed logging** of each attempt
3. **Clear error messages** about what was tried
4. **No silent failures** - always reports what happened
### Retry Mechanism
- **Multiple attempts** with increasing flexibility
- **Different strategies** on each retry
- **Configurable retry count** (default: 3)
- **Delay between retries** to handle loading
## Future Enhancements
### Advanced Real-Time Features
- **Visual element detection** using screenshots
- **Machine learning** field recognition
- **Context-aware** field relationships
- **Performance optimization** for faster discovery
### Real-Time Analytics
- **Discovery success rates** by strategy
- **Performance metrics** for each method
- **Field matching accuracy** tracking
- **Site compatibility** reporting
## Migration from Cached System
### Automatic Migration
- **No code changes** required for existing voice commands
- **Backward compatibility** maintained
- **Enhanced reliability** with real-time discovery
- **Same API** with improved implementation
### Benefits of Migration
- **Eliminates cache issues** - no more stale selectors
- **Improves accuracy** - always uses current page state
- **Reduces maintenance** - no cache management needed
- **Increases reliability** - works on dynamic sites
The real-time discovery system ensures that the LiveKit agent always works with the most current page state, providing maximum reliability and compatibility across all websites.