# Intelligent Selector Discovery ## Overview The LiveKit agent now includes intelligent selector discovery functionality that automatically adapts to changing web page structures, particularly for Google search results. When standard CSS selectors fail (like the common "No valid content found for selector: .r" error), the system intelligently discovers alternative selectors. ## Problem Solved Google and other search engines frequently change their HTML structure, causing hardcoded CSS selectors to break. The old system would fail with errors like: - "No valid content found for selector: .r" - "No search results found on this page" ## How It Works ### 1. Multi-Layer Fallback System The intelligent discovery system uses a multi-layer approach: 1. **Standard Selectors**: Try known working selectors first 2. **Intelligent Discovery**: Generate smart selectors based on common patterns 3. **DOM Analysis**: Analyze page structure using heuristics 4. **Final Fallback**: Extract any meaningful content ### 2. Intelligent Selector Generation The system generates selectors based on modern web patterns: ```javascript // Modern Google patterns (2024+) '[data-ved] h3', '[data-ved]:has(h3)', '[jscontroller]:has(h3)', // Generic search result patterns 'div[class*="result"]:has(h3)', 'article:has(h3)', '[role="main"] div:has(h3)', // Link-based patterns 'a[href*="http"]:has(h3)', 'div:has(h3):has(a[href*="http"])' ``` ### 3. Content Validation Each discovered selector is validated to ensure it contains actual search results: - Must have headings (h1-h6) and links - Must contain substantial text content (>50 characters) - Must have search result indicators (URLs, titles, snippets) ### 4. DOM Structure Analysis If intelligent selectors fail, the system analyzes the DOM structure: - Looks for containers with multiple links - Identifies repeated structures - Finds main content areas - Uses semantic HTML patterns ## Implementation Details ### LiveKit Agent (Python) The main implementation is in `agent-livekit/mcp_chrome_client.py`: - `_discover_search_result_selectors()`: Main discovery function - `_generate_intelligent_search_selectors()`: Generate smart selectors - `_validate_search_results_content()`: Validate content quality - `_analyze_dom_for_search_results()`: DOM structure analysis - `_final_intelligent_discovery()`: Last resort broad patterns ### Chrome Extension (JavaScript) Enhanced functionality in `app/chrome-extension/inject-scripts/enhanced-search-helper.js`: - `discoverSearchResultElements()`: Client-side intelligent discovery - `validateSearchResultElement()`: Element validation - `analyzeDOMForSearchResults()`: DOM analysis - `extractResultFromElement()`: Flexible data extraction ## Usage The intelligent discovery is automatically triggered when standard selectors fail. No additional configuration is required. ### Voice Commands ``` "Search for intelligent selector discovery" ``` The system will: 1. Navigate to Google 2. Perform the search 3. Try standard selectors 4. Fall back to intelligent discovery if needed 5. Return formatted results ### Logging The system provides detailed logging to track which method was successful: ``` 🔍 Starting intelligent selector discovery for search results... ✅ Found valid search results with intelligent selector: [data-ved]:has(h3) ``` ## Benefits 1. **Resilience**: Adapts to changing website structures 2. **Broad Compatibility**: Works across different search engines 3. **Automatic**: No manual intervention required 4. **Detailed Logging**: Easy to debug and monitor 5. **Performance**: Efficient fallback hierarchy ## Testing Run the test suite to verify functionality: ```bash node test-intelligent-search-selectors.js ``` This will test: - Google search result extraction - DuckDuckGo compatibility - Selector validation functions - Content extraction accuracy ## Supported Patterns ### Search Engines - Google (all modern layouts) - DuckDuckGo - Bing - Yahoo - Generic search result pages ### Element Patterns - Modern data attributes (`data-ved`, `jscontroller`) - Semantic HTML (`role="main"`, `article`) - Class-based patterns (`class*="result"`) - Link and heading combinations - Container structures ## Future Enhancements 1. **Machine Learning**: Train models on successful selector patterns 2. **Site-Specific Rules**: Custom rules for specific websites 3. **Performance Optimization**: Cache successful selectors 4. **User Feedback**: Learn from user corrections 5. **Visual Recognition**: Use computer vision for element detection ## Troubleshooting ### Common Issues 1. **No results found**: Check if the page has loaded completely 2. **Incorrect extraction**: Verify the page structure hasn't changed dramatically 3. **Performance issues**: Reduce the number of fallback selectors ### Debug Mode Enable detailed logging by setting the log level to DEBUG in the LiveKit agent configuration. ### Manual Override If needed, you can specify custom selectors in the MCP client configuration. ## Contributing When adding new selector patterns: 1. Test across multiple search engines 2. Validate content quality 3. Add appropriate logging 4. Update test cases 5. Document new patterns ## Related Files - `agent-livekit/mcp_chrome_client.py` - Main Python implementation - `app/chrome-extension/inject-scripts/enhanced-search-helper.js` - JavaScript client - `test-intelligent-search-selectors.js` - Test suite - `agent-livekit/livekit_agent.py` - Integration with voice commands