first commit
This commit is contained in:
204
agent-livekit/DYNAMIC_FORM_FILLING.md
Normal file
204
agent-livekit/DYNAMIC_FORM_FILLING.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# Dynamic Form Filling System
|
||||
|
||||
## Overview
|
||||
|
||||
The LiveKit agent now features an advanced dynamic form filling system that automatically discovers and fills web forms based on user voice commands. This system is designed to be robust, adaptive, and never relies on hardcoded selectors.
|
||||
|
||||
## Key Features
|
||||
|
||||
### 🔄 Dynamic Discovery
|
||||
- **Real-time element discovery** using MCP tools (`chrome_get_interactive_elements`, `chrome_get_content_web_form`)
|
||||
- **No hardcoded selectors** - all form elements are discovered dynamically
|
||||
- **Adaptive to different websites** - works across various web platforms
|
||||
|
||||
### 🔁 Retry Mechanism
|
||||
- **Automatic retry** when fields are not found on first attempt
|
||||
- **Multiple discovery strategies** with increasing flexibility
|
||||
- **Fallback methods** for challenging form structures
|
||||
|
||||
### 🗣️ Natural Language Processing
|
||||
- **Intelligent field mapping** from natural language to form elements
|
||||
- **Voice command processing** for hands-free form filling
|
||||
- **Flexible matching** that understands field variations
|
||||
|
||||
## How It Works
|
||||
|
||||
### 1. Voice Command Processing
|
||||
|
||||
When a user says something like:
|
||||
- "fill email with john@example.com"
|
||||
- "enter password secret123"
|
||||
- "type hello in search box"
|
||||
|
||||
The system processes these commands through multiple stages:
|
||||
|
||||
```python
|
||||
# Voice command is parsed to extract field name and value
|
||||
field_name = "email"
|
||||
value = "john@example.com"
|
||||
|
||||
# Dynamic discovery is triggered
|
||||
result = await client.fill_field_by_name(field_name, value)
|
||||
```
|
||||
|
||||
### 2. Dynamic Discovery Process
|
||||
|
||||
The system follows a multi-step discovery process:
|
||||
|
||||
#### Step 1: Cached Fields Check
|
||||
- First checks if the field is already in the cache
|
||||
- Uses previously discovered selectors for speed
|
||||
|
||||
#### Step 2: Dynamic MCP Discovery
|
||||
- Uses `chrome_get_interactive_elements` to get fresh form elements
|
||||
- Analyzes element attributes (name, id, placeholder, aria-label, etc.)
|
||||
- Matches field descriptions to actual form elements
|
||||
|
||||
#### Step 3: Enhanced Detection with Retry
|
||||
- If initial discovery fails, retries with more flexible matching
|
||||
- Each retry attempt becomes more permissive in matching criteria
|
||||
- Up to 3 retry attempts with different strategies
|
||||
|
||||
#### Step 4: Content Analysis
|
||||
- As a final fallback, analyzes page content
|
||||
- Generates intelligent selectors based on field name patterns
|
||||
- Tests generated selectors for validity
|
||||
|
||||
### 3. Field Matching Algorithm
|
||||
|
||||
The system uses sophisticated field matching that considers:
|
||||
|
||||
```python
|
||||
def _is_field_match(element, field_name):
|
||||
# Check multiple attributes
|
||||
attributes_to_check = [
|
||||
"name", "id", "placeholder",
|
||||
"aria-label", "class", "type"
|
||||
]
|
||||
|
||||
# Field name variations
|
||||
variations = [
|
||||
field_name,
|
||||
field_name.replace(" ", ""),
|
||||
field_name.replace("_", ""),
|
||||
# ... more variations
|
||||
]
|
||||
|
||||
# Special type handling
|
||||
if field_name in ["email", "mail"] and type == "email":
|
||||
return True
|
||||
# ... more type-specific logic
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Voice Commands
|
||||
|
||||
```
|
||||
User: "fill email with john@example.com"
|
||||
Agent: ✓ Filled 'email' field using dynamic discovery
|
||||
|
||||
User: "enter password secret123"
|
||||
Agent: ✓ Filled 'password' field using cached data
|
||||
|
||||
User: "type hello world in search box"
|
||||
Agent: ✓ Filled 'search' field using enhanced detection
|
||||
```
|
||||
|
||||
### Programmatic Usage
|
||||
|
||||
```python
|
||||
# Direct field filling
|
||||
result = await client.fill_field_by_name("email", "user@example.com")
|
||||
|
||||
# Voice command processing
|
||||
result = await client.execute_voice_command("fill search with python")
|
||||
|
||||
# Pure dynamic discovery (no cache)
|
||||
result = await client._discover_form_fields_dynamically("username", "john_doe")
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### Main Methods
|
||||
|
||||
#### `fill_field_by_name(field_name: str, value: str) -> str`
|
||||
Main method for filling form fields with dynamic discovery.
|
||||
|
||||
#### `_discover_form_fields_dynamically(field_name: str, value: str) -> dict`
|
||||
Pure dynamic discovery using MCP tools without cache.
|
||||
|
||||
#### `_enhanced_field_detection_with_retry(field_name: str, value: str, max_retries: int) -> dict`
|
||||
Enhanced detection with configurable retry mechanism.
|
||||
|
||||
#### `_analyze_page_content_for_field(field_name: str, value: str) -> dict`
|
||||
Content analysis fallback method.
|
||||
|
||||
### Helper Methods
|
||||
|
||||
#### `_is_field_match(element: dict, field_name: str) -> bool`
|
||||
Determines if an element matches the requested field name.
|
||||
|
||||
#### `_extract_best_selector(element: dict) -> str`
|
||||
Extracts the most reliable CSS selector for an element.
|
||||
|
||||
#### `_is_flexible_field_match(element: dict, field_name: str, attempt: int) -> bool`
|
||||
Flexible matching that becomes more permissive with each retry.
|
||||
|
||||
## Configuration
|
||||
|
||||
### MCP Tools Required
|
||||
- `chrome_get_interactive_elements`
|
||||
- `chrome_get_content_web_form`
|
||||
- `chrome_get_web_content`
|
||||
- `chrome_fill_or_select`
|
||||
- `chrome_click_element`
|
||||
|
||||
### Retry Settings
|
||||
```python
|
||||
max_retries = 3 # Number of retry attempts
|
||||
retry_delay = 1 # Seconds between retries
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The system provides comprehensive error handling:
|
||||
|
||||
1. **Graceful degradation** - falls back to simpler methods if advanced ones fail
|
||||
2. **Detailed logging** - logs all discovery attempts for debugging
|
||||
3. **User feedback** - provides clear messages about what was attempted
|
||||
4. **Exception safety** - catches and handles all exceptions gracefully
|
||||
|
||||
## Testing
|
||||
|
||||
Run the test suite to verify functionality:
|
||||
|
||||
```bash
|
||||
python test_dynamic_form_filling.py
|
||||
```
|
||||
|
||||
This will test:
|
||||
- Dynamic field discovery
|
||||
- Retry mechanisms
|
||||
- Voice command processing
|
||||
- Field matching algorithms
|
||||
- Cross-website compatibility
|
||||
|
||||
## Benefits
|
||||
|
||||
### For Users
|
||||
- **Natural interaction** - speak naturally about form fields
|
||||
- **Reliable filling** - works across different websites
|
||||
- **No setup required** - automatically adapts to new sites
|
||||
|
||||
### For Developers
|
||||
- **No hardcoded selectors** - eliminates brittle selector maintenance
|
||||
- **Robust error handling** - graceful failure and recovery
|
||||
- **Extensible design** - easy to add new discovery strategies
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- **Machine learning** field recognition
|
||||
- **Visual element detection** using screenshots
|
||||
- **Form structure analysis** for better field relationships
|
||||
- **User preference learning** for improved matching accuracy
|
Reference in New Issue
Block a user