Files
broswer-automation/agent-livekit/DYNAMIC_FORM_FILLING.md
nasir@endelospay.com d97cad1736 first commit
2025-08-12 02:54:17 +05:00

205 lines
6.1 KiB
Markdown

# Dynamic Form Filling System
## Overview
The LiveKit agent now features an advanced dynamic form filling system that automatically discovers and fills web forms based on user voice commands. This system is designed to be robust, adaptive, and never relies on hardcoded selectors.
## Key Features
### 🔄 Dynamic Discovery
- **Real-time element discovery** using MCP tools (`chrome_get_interactive_elements`, `chrome_get_content_web_form`)
- **No hardcoded selectors** - all form elements are discovered dynamically
- **Adaptive to different websites** - works across various web platforms
### 🔁 Retry Mechanism
- **Automatic retry** when fields are not found on first attempt
- **Multiple discovery strategies** with increasing flexibility
- **Fallback methods** for challenging form structures
### 🗣️ Natural Language Processing
- **Intelligent field mapping** from natural language to form elements
- **Voice command processing** for hands-free form filling
- **Flexible matching** that understands field variations
## How It Works
### 1. Voice Command Processing
When a user says something like:
- "fill email with john@example.com"
- "enter password secret123"
- "type hello in search box"
The system processes these commands through multiple stages:
```python
# Voice command is parsed to extract field name and value
field_name = "email"
value = "john@example.com"
# Dynamic discovery is triggered
result = await client.fill_field_by_name(field_name, value)
```
### 2. Dynamic Discovery Process
The system follows a multi-step discovery process:
#### Step 1: Cached Fields Check
- First checks if the field is already in the cache
- Uses previously discovered selectors for speed
#### Step 2: Dynamic MCP Discovery
- Uses `chrome_get_interactive_elements` to get fresh form elements
- Analyzes element attributes (name, id, placeholder, aria-label, etc.)
- Matches field descriptions to actual form elements
#### Step 3: Enhanced Detection with Retry
- If initial discovery fails, retries with more flexible matching
- Each retry attempt becomes more permissive in matching criteria
- Up to 3 retry attempts with different strategies
#### Step 4: Content Analysis
- As a final fallback, analyzes page content
- Generates intelligent selectors based on field name patterns
- Tests generated selectors for validity
### 3. Field Matching Algorithm
The system uses sophisticated field matching that considers:
```python
def _is_field_match(element, field_name):
# Check multiple attributes
attributes_to_check = [
"name", "id", "placeholder",
"aria-label", "class", "type"
]
# Field name variations
variations = [
field_name,
field_name.replace(" ", ""),
field_name.replace("_", ""),
# ... more variations
]
# Special type handling
if field_name in ["email", "mail"] and type == "email":
return True
# ... more type-specific logic
```
## Usage Examples
### Basic Voice Commands
```
User: "fill email with john@example.com"
Agent: ✓ Filled 'email' field using dynamic discovery
User: "enter password secret123"
Agent: ✓ Filled 'password' field using cached data
User: "type hello world in search box"
Agent: ✓ Filled 'search' field using enhanced detection
```
### Programmatic Usage
```python
# Direct field filling
result = await client.fill_field_by_name("email", "user@example.com")
# Voice command processing
result = await client.execute_voice_command("fill search with python")
# Pure dynamic discovery (no cache)
result = await client._discover_form_fields_dynamically("username", "john_doe")
```
## API Reference
### Main Methods
#### `fill_field_by_name(field_name: str, value: str) -> str`
Main method for filling form fields with dynamic discovery.
#### `_discover_form_fields_dynamically(field_name: str, value: str) -> dict`
Pure dynamic discovery using MCP tools without cache.
#### `_enhanced_field_detection_with_retry(field_name: str, value: str, max_retries: int) -> dict`
Enhanced detection with configurable retry mechanism.
#### `_analyze_page_content_for_field(field_name: str, value: str) -> dict`
Content analysis fallback method.
### Helper Methods
#### `_is_field_match(element: dict, field_name: str) -> bool`
Determines if an element matches the requested field name.
#### `_extract_best_selector(element: dict) -> str`
Extracts the most reliable CSS selector for an element.
#### `_is_flexible_field_match(element: dict, field_name: str, attempt: int) -> bool`
Flexible matching that becomes more permissive with each retry.
## Configuration
### MCP Tools Required
- `chrome_get_interactive_elements`
- `chrome_get_content_web_form`
- `chrome_get_web_content`
- `chrome_fill_or_select`
- `chrome_click_element`
### Retry Settings
```python
max_retries = 3 # Number of retry attempts
retry_delay = 1 # Seconds between retries
```
## Error Handling
The system provides comprehensive error handling:
1. **Graceful degradation** - falls back to simpler methods if advanced ones fail
2. **Detailed logging** - logs all discovery attempts for debugging
3. **User feedback** - provides clear messages about what was attempted
4. **Exception safety** - catches and handles all exceptions gracefully
## Testing
Run the test suite to verify functionality:
```bash
python test_dynamic_form_filling.py
```
This will test:
- Dynamic field discovery
- Retry mechanisms
- Voice command processing
- Field matching algorithms
- Cross-website compatibility
## Benefits
### For Users
- **Natural interaction** - speak naturally about form fields
- **Reliable filling** - works across different websites
- **No setup required** - automatically adapts to new sites
### For Developers
- **No hardcoded selectors** - eliminates brittle selector maintenance
- **Robust error handling** - graceful failure and recovery
- **Extensible design** - easy to add new discovery strategies
## Future Enhancements
- **Machine learning** field recognition
- **Visual element detection** using screenshots
- **Form structure analysis** for better field relationships
- **User preference learning** for improved matching accuracy