6.1 KiB
Dynamic Form Filling System
Overview
The LiveKit agent now features an advanced dynamic form filling system that automatically discovers and fills web forms based on user voice commands. This system is designed to be robust, adaptive, and never relies on hardcoded selectors.
Key Features
🔄 Dynamic Discovery
- Real-time element discovery using MCP tools (
chrome_get_interactive_elements
,chrome_get_content_web_form
) - No hardcoded selectors - all form elements are discovered dynamically
- Adaptive to different websites - works across various web platforms
🔁 Retry Mechanism
- Automatic retry when fields are not found on first attempt
- Multiple discovery strategies with increasing flexibility
- Fallback methods for challenging form structures
🗣️ Natural Language Processing
- Intelligent field mapping from natural language to form elements
- Voice command processing for hands-free form filling
- Flexible matching that understands field variations
How It Works
1. Voice Command Processing
When a user says something like:
- "fill email with john@example.com"
- "enter password secret123"
- "type hello in search box"
The system processes these commands through multiple stages:
# Voice command is parsed to extract field name and value
field_name = "email"
value = "john@example.com"
# Dynamic discovery is triggered
result = await client.fill_field_by_name(field_name, value)
2. Dynamic Discovery Process
The system follows a multi-step discovery process:
Step 1: Cached Fields Check
- First checks if the field is already in the cache
- Uses previously discovered selectors for speed
Step 2: Dynamic MCP Discovery
- Uses
chrome_get_interactive_elements
to get fresh form elements - Analyzes element attributes (name, id, placeholder, aria-label, etc.)
- Matches field descriptions to actual form elements
Step 3: Enhanced Detection with Retry
- If initial discovery fails, retries with more flexible matching
- Each retry attempt becomes more permissive in matching criteria
- Up to 3 retry attempts with different strategies
Step 4: Content Analysis
- As a final fallback, analyzes page content
- Generates intelligent selectors based on field name patterns
- Tests generated selectors for validity
3. Field Matching Algorithm
The system uses sophisticated field matching that considers:
def _is_field_match(element, field_name):
# Check multiple attributes
attributes_to_check = [
"name", "id", "placeholder",
"aria-label", "class", "type"
]
# Field name variations
variations = [
field_name,
field_name.replace(" ", ""),
field_name.replace("_", ""),
# ... more variations
]
# Special type handling
if field_name in ["email", "mail"] and type == "email":
return True
# ... more type-specific logic
Usage Examples
Basic Voice Commands
User: "fill email with john@example.com"
Agent: ✓ Filled 'email' field using dynamic discovery
User: "enter password secret123"
Agent: ✓ Filled 'password' field using cached data
User: "type hello world in search box"
Agent: ✓ Filled 'search' field using enhanced detection
Programmatic Usage
# Direct field filling
result = await client.fill_field_by_name("email", "user@example.com")
# Voice command processing
result = await client.execute_voice_command("fill search with python")
# Pure dynamic discovery (no cache)
result = await client._discover_form_fields_dynamically("username", "john_doe")
API Reference
Main Methods
fill_field_by_name(field_name: str, value: str) -> str
Main method for filling form fields with dynamic discovery.
_discover_form_fields_dynamically(field_name: str, value: str) -> dict
Pure dynamic discovery using MCP tools without cache.
_enhanced_field_detection_with_retry(field_name: str, value: str, max_retries: int) -> dict
Enhanced detection with configurable retry mechanism.
_analyze_page_content_for_field(field_name: str, value: str) -> dict
Content analysis fallback method.
Helper Methods
_is_field_match(element: dict, field_name: str) -> bool
Determines if an element matches the requested field name.
_extract_best_selector(element: dict) -> str
Extracts the most reliable CSS selector for an element.
_is_flexible_field_match(element: dict, field_name: str, attempt: int) -> bool
Flexible matching that becomes more permissive with each retry.
Configuration
MCP Tools Required
chrome_get_interactive_elements
chrome_get_content_web_form
chrome_get_web_content
chrome_fill_or_select
chrome_click_element
Retry Settings
max_retries = 3 # Number of retry attempts
retry_delay = 1 # Seconds between retries
Error Handling
The system provides comprehensive error handling:
- Graceful degradation - falls back to simpler methods if advanced ones fail
- Detailed logging - logs all discovery attempts for debugging
- User feedback - provides clear messages about what was attempted
- Exception safety - catches and handles all exceptions gracefully
Testing
Run the test suite to verify functionality:
python test_dynamic_form_filling.py
This will test:
- Dynamic field discovery
- Retry mechanisms
- Voice command processing
- Field matching algorithms
- Cross-website compatibility
Benefits
For Users
- Natural interaction - speak naturally about form fields
- Reliable filling - works across different websites
- No setup required - automatically adapts to new sites
For Developers
- No hardcoded selectors - eliminates brittle selector maintenance
- Robust error handling - graceful failure and recovery
- Extensible design - easy to add new discovery strategies
Future Enhancements
- Machine learning field recognition
- Visual element detection using screenshots
- Form structure analysis for better field relationships
- User preference learning for improved matching accuracy