Files
broswer-automation/agent-livekit/DYNAMIC_FORM_FILLING.md
nasir@endelospay.com d97cad1736 first commit
2025-08-12 02:54:17 +05:00

6.1 KiB

Dynamic Form Filling System

Overview

The LiveKit agent now features an advanced dynamic form filling system that automatically discovers and fills web forms based on user voice commands. This system is designed to be robust, adaptive, and never relies on hardcoded selectors.

Key Features

🔄 Dynamic Discovery

  • Real-time element discovery using MCP tools (chrome_get_interactive_elements, chrome_get_content_web_form)
  • No hardcoded selectors - all form elements are discovered dynamically
  • Adaptive to different websites - works across various web platforms

🔁 Retry Mechanism

  • Automatic retry when fields are not found on first attempt
  • Multiple discovery strategies with increasing flexibility
  • Fallback methods for challenging form structures

🗣️ Natural Language Processing

  • Intelligent field mapping from natural language to form elements
  • Voice command processing for hands-free form filling
  • Flexible matching that understands field variations

How It Works

1. Voice Command Processing

When a user says something like:

  • "fill email with john@example.com"
  • "enter password secret123"
  • "type hello in search box"

The system processes these commands through multiple stages:

# Voice command is parsed to extract field name and value
field_name = "email"
value = "john@example.com"

# Dynamic discovery is triggered
result = await client.fill_field_by_name(field_name, value)

2. Dynamic Discovery Process

The system follows a multi-step discovery process:

Step 1: Cached Fields Check

  • First checks if the field is already in the cache
  • Uses previously discovered selectors for speed

Step 2: Dynamic MCP Discovery

  • Uses chrome_get_interactive_elements to get fresh form elements
  • Analyzes element attributes (name, id, placeholder, aria-label, etc.)
  • Matches field descriptions to actual form elements

Step 3: Enhanced Detection with Retry

  • If initial discovery fails, retries with more flexible matching
  • Each retry attempt becomes more permissive in matching criteria
  • Up to 3 retry attempts with different strategies

Step 4: Content Analysis

  • As a final fallback, analyzes page content
  • Generates intelligent selectors based on field name patterns
  • Tests generated selectors for validity

3. Field Matching Algorithm

The system uses sophisticated field matching that considers:

def _is_field_match(element, field_name):
    # Check multiple attributes
    attributes_to_check = [
        "name", "id", "placeholder", 
        "aria-label", "class", "type"
    ]
    
    # Field name variations
    variations = [
        field_name,
        field_name.replace(" ", ""),
        field_name.replace("_", ""),
        # ... more variations
    ]
    
    # Special type handling
    if field_name in ["email", "mail"] and type == "email":
        return True
    # ... more type-specific logic

Usage Examples

Basic Voice Commands

User: "fill email with john@example.com"
Agent: ✓ Filled 'email' field using dynamic discovery

User: "enter password secret123"
Agent: ✓ Filled 'password' field using cached data

User: "type hello world in search box"
Agent: ✓ Filled 'search' field using enhanced detection

Programmatic Usage

# Direct field filling
result = await client.fill_field_by_name("email", "user@example.com")

# Voice command processing
result = await client.execute_voice_command("fill search with python")

# Pure dynamic discovery (no cache)
result = await client._discover_form_fields_dynamically("username", "john_doe")

API Reference

Main Methods

fill_field_by_name(field_name: str, value: str) -> str

Main method for filling form fields with dynamic discovery.

_discover_form_fields_dynamically(field_name: str, value: str) -> dict

Pure dynamic discovery using MCP tools without cache.

_enhanced_field_detection_with_retry(field_name: str, value: str, max_retries: int) -> dict

Enhanced detection with configurable retry mechanism.

_analyze_page_content_for_field(field_name: str, value: str) -> dict

Content analysis fallback method.

Helper Methods

_is_field_match(element: dict, field_name: str) -> bool

Determines if an element matches the requested field name.

_extract_best_selector(element: dict) -> str

Extracts the most reliable CSS selector for an element.

_is_flexible_field_match(element: dict, field_name: str, attempt: int) -> bool

Flexible matching that becomes more permissive with each retry.

Configuration

MCP Tools Required

  • chrome_get_interactive_elements
  • chrome_get_content_web_form
  • chrome_get_web_content
  • chrome_fill_or_select
  • chrome_click_element

Retry Settings

max_retries = 3  # Number of retry attempts
retry_delay = 1  # Seconds between retries

Error Handling

The system provides comprehensive error handling:

  1. Graceful degradation - falls back to simpler methods if advanced ones fail
  2. Detailed logging - logs all discovery attempts for debugging
  3. User feedback - provides clear messages about what was attempted
  4. Exception safety - catches and handles all exceptions gracefully

Testing

Run the test suite to verify functionality:

python test_dynamic_form_filling.py

This will test:

  • Dynamic field discovery
  • Retry mechanisms
  • Voice command processing
  • Field matching algorithms
  • Cross-website compatibility

Benefits

For Users

  • Natural interaction - speak naturally about form fields
  • Reliable filling - works across different websites
  • No setup required - automatically adapts to new sites

For Developers

  • No hardcoded selectors - eliminates brittle selector maintenance
  • Robust error handling - graceful failure and recovery
  • Extensible design - easy to add new discovery strategies

Future Enhancements

  • Machine learning field recognition
  • Visual element detection using screenshots
  • Form structure analysis for better field relationships
  • User preference learning for improved matching accuracy