Files

nasir@endelospay.com d97cad1736 first commit

2025-08-12 02:54:17 +05:00

6.1 KiB

Raw Blame History

Dynamic Form Filling System

Overview

The LiveKit agent now features an advanced dynamic form filling system that automatically discovers and fills web forms based on user voice commands. This system is designed to be robust, adaptive, and never relies on hardcoded selectors.

Key Features

🔄 Dynamic Discovery

Real-time element discovery using MCP tools (chrome_get_interactive_elements, chrome_get_content_web_form)
No hardcoded selectors - all form elements are discovered dynamically
Adaptive to different websites - works across various web platforms

🔁 Retry Mechanism

Automatic retry when fields are not found on first attempt
Multiple discovery strategies with increasing flexibility
Fallback methods for challenging form structures

🗣️ Natural Language Processing

Intelligent field mapping from natural language to form elements
Voice command processing for hands-free form filling
Flexible matching that understands field variations

How It Works

1. Voice Command Processing

When a user says something like:

"fill email with john@example.com"
"enter password secret123"
"type hello in search box"

The system processes these commands through multiple stages:

# Voice command is parsed to extract field name and value
field_name = "email"
value = "john@example.com"

# Dynamic discovery is triggered
result = await client.fill_field_by_name(field_name, value)

2. Dynamic Discovery Process

The system follows a multi-step discovery process:

Step 1: Cached Fields Check

First checks if the field is already in the cache
Uses previously discovered selectors for speed

Step 2: Dynamic MCP Discovery

Uses chrome_get_interactive_elements to get fresh form elements
Analyzes element attributes (name, id, placeholder, aria-label, etc.)
Matches field descriptions to actual form elements

Step 3: Enhanced Detection with Retry

If initial discovery fails, retries with more flexible matching
Each retry attempt becomes more permissive in matching criteria
Up to 3 retry attempts with different strategies

Step 4: Content Analysis

As a final fallback, analyzes page content
Generates intelligent selectors based on field name patterns
Tests generated selectors for validity

3. Field Matching Algorithm

The system uses sophisticated field matching that considers:

def _is_field_match(element, field_name):
    # Check multiple attributes
    attributes_to_check = [
        "name", "id", "placeholder", 
        "aria-label", "class", "type"
    ]
    
    # Field name variations
    variations = [
        field_name,
        field_name.replace(" ", ""),
        field_name.replace("_", ""),
        # ... more variations
    ]
    
    # Special type handling
    if field_name in ["email", "mail"] and type == "email":
        return True
    # ... more type-specific logic

Usage Examples

Basic Voice Commands

User: "fill email with john@example.com"
Agent: ✓ Filled 'email' field using dynamic discovery

User: "enter password secret123"
Agent: ✓ Filled 'password' field using cached data

User: "type hello world in search box"
Agent: ✓ Filled 'search' field using enhanced detection

Programmatic Usage

# Direct field filling
result = await client.fill_field_by_name("email", "user@example.com")

# Voice command processing
result = await client.execute_voice_command("fill search with python")

# Pure dynamic discovery (no cache)
result = await client._discover_form_fields_dynamically("username", "john_doe")

API Reference

Main Methods

`fill_field_by_name(field_name: str, value: str) -> str`

Main method for filling form fields with dynamic discovery.

`_discover_form_fields_dynamically(field_name: str, value: str) -> dict`

Pure dynamic discovery using MCP tools without cache.

`_enhanced_field_detection_with_retry(field_name: str, value: str, max_retries: int) -> dict`

Enhanced detection with configurable retry mechanism.

`_analyze_page_content_for_field(field_name: str, value: str) -> dict`

Content analysis fallback method.

Helper Methods

`_is_field_match(element: dict, field_name: str) -> bool`

Determines if an element matches the requested field name.

`_extract_best_selector(element: dict) -> str`

Extracts the most reliable CSS selector for an element.

`_is_flexible_field_match(element: dict, field_name: str, attempt: int) -> bool`

Flexible matching that becomes more permissive with each retry.

Configuration

MCP Tools Required

chrome_get_interactive_elements
chrome_get_content_web_form
chrome_get_web_content
chrome_fill_or_select
chrome_click_element

Retry Settings

max_retries = 3  # Number of retry attempts
retry_delay = 1  # Seconds between retries

Error Handling

The system provides comprehensive error handling:

Graceful degradation - falls back to simpler methods if advanced ones fail
Detailed logging - logs all discovery attempts for debugging
User feedback - provides clear messages about what was attempted
Exception safety - catches and handles all exceptions gracefully

Testing

Run the test suite to verify functionality:

python test_dynamic_form_filling.py

This will test:

Dynamic field discovery
Retry mechanisms
Voice command processing
Field matching algorithms
Cross-website compatibility

Benefits

For Users

Natural interaction - speak naturally about form fields
Reliable filling - works across different websites
No setup required - automatically adapts to new sites

For Developers

No hardcoded selectors - eliminates brittle selector maintenance
Robust error handling - graceful failure and recovery
Extensible design - easy to add new discovery strategies

Future Enhancements

Machine learning field recognition
Visual element detection using screenshots
Form structure analysis for better field relationships
User preference learning for improved matching accuracy

6.1 KiB Raw Blame History