5.4 KiB
Intelligent Selector Discovery
Overview
The LiveKit agent now includes intelligent selector discovery functionality that automatically adapts to changing web page structures, particularly for Google search results. When standard CSS selectors fail (like the common "No valid content found for selector: .r" error), the system intelligently discovers alternative selectors.
Problem Solved
Google and other search engines frequently change their HTML structure, causing hardcoded CSS selectors to break. The old system would fail with errors like:
- "No valid content found for selector: .r"
- "No search results found on this page"
How It Works
1. Multi-Layer Fallback System
The intelligent discovery system uses a multi-layer approach:
- Standard Selectors: Try known working selectors first
- Intelligent Discovery: Generate smart selectors based on common patterns
- DOM Analysis: Analyze page structure using heuristics
- Final Fallback: Extract any meaningful content
2. Intelligent Selector Generation
The system generates selectors based on modern web patterns:
// Modern Google patterns (2024+)
'[data-ved] h3',
'[data-ved]:has(h3)',
'[jscontroller]:has(h3)',
// Generic search result patterns
'div[class*="result"]:has(h3)',
'article:has(h3)',
'[role="main"] div:has(h3)',
// Link-based patterns
'a[href*="http"]:has(h3)',
'div:has(h3):has(a[href*="http"])'
3. Content Validation
Each discovered selector is validated to ensure it contains actual search results:
- Must have headings (h1-h6) and links
- Must contain substantial text content (>50 characters)
- Must have search result indicators (URLs, titles, snippets)
4. DOM Structure Analysis
If intelligent selectors fail, the system analyzes the DOM structure:
- Looks for containers with multiple links
- Identifies repeated structures
- Finds main content areas
- Uses semantic HTML patterns
Implementation Details
LiveKit Agent (Python)
The main implementation is in agent-livekit/mcp_chrome_client.py
:
_discover_search_result_selectors()
: Main discovery function_generate_intelligent_search_selectors()
: Generate smart selectors_validate_search_results_content()
: Validate content quality_analyze_dom_for_search_results()
: DOM structure analysis_final_intelligent_discovery()
: Last resort broad patterns
Chrome Extension (JavaScript)
Enhanced functionality in app/chrome-extension/inject-scripts/enhanced-search-helper.js
:
discoverSearchResultElements()
: Client-side intelligent discoveryvalidateSearchResultElement()
: Element validationanalyzeDOMForSearchResults()
: DOM analysisextractResultFromElement()
: Flexible data extraction
Usage
The intelligent discovery is automatically triggered when standard selectors fail. No additional configuration is required.
Voice Commands
"Search for intelligent selector discovery"
The system will:
- Navigate to Google
- Perform the search
- Try standard selectors
- Fall back to intelligent discovery if needed
- Return formatted results
Logging
The system provides detailed logging to track which method was successful:
🔍 Starting intelligent selector discovery for search results...
✅ Found valid search results with intelligent selector: [data-ved]:has(h3)
Benefits
- Resilience: Adapts to changing website structures
- Broad Compatibility: Works across different search engines
- Automatic: No manual intervention required
- Detailed Logging: Easy to debug and monitor
- Performance: Efficient fallback hierarchy
Testing
Run the test suite to verify functionality:
node test-intelligent-search-selectors.js
This will test:
- Google search result extraction
- DuckDuckGo compatibility
- Selector validation functions
- Content extraction accuracy
Supported Patterns
Search Engines
- Google (all modern layouts)
- DuckDuckGo
- Bing
- Yahoo
- Generic search result pages
Element Patterns
- Modern data attributes (
data-ved
,jscontroller
) - Semantic HTML (
role="main"
,article
) - Class-based patterns (
class*="result"
) - Link and heading combinations
- Container structures
Future Enhancements
- Machine Learning: Train models on successful selector patterns
- Site-Specific Rules: Custom rules for specific websites
- Performance Optimization: Cache successful selectors
- User Feedback: Learn from user corrections
- Visual Recognition: Use computer vision for element detection
Troubleshooting
Common Issues
- No results found: Check if the page has loaded completely
- Incorrect extraction: Verify the page structure hasn't changed dramatically
- Performance issues: Reduce the number of fallback selectors
Debug Mode
Enable detailed logging by setting the log level to DEBUG in the LiveKit agent configuration.
Manual Override
If needed, you can specify custom selectors in the MCP client configuration.
Contributing
When adding new selector patterns:
- Test across multiple search engines
- Validate content quality
- Add appropriate logging
- Update test cases
- Document new patterns
Related Files
agent-livekit/mcp_chrome_client.py
- Main Python implementationapp/chrome-extension/inject-scripts/enhanced-search-helper.js
- JavaScript clienttest-intelligent-search-selectors.js
- Test suiteagent-livekit/livekit_agent.py
- Integration with voice commands