What Is SoSearch?
SoSearch is a Rust-based search API built by NetLops that emulates SerpAPI and Tavily by scraping DuckDuckGo, Yahoo, and Brave Search engines concurrently. It standardizes raw HTML results into a SearchResult JSON array via the scraper crate and serves them over Axum HTTP or as an MCP stdio server for AI agents. With 63 GitHub stars as of February 2025 and built on Tokio async runtime, SoSearch is one of the best Search APIs for AI agent developers needing free, low-latency access to web search data. It includes agent skills for Gemini CLI and .agents configs, enabling seamless integration into toolchains like those using JSON-RPC 2.0 over stdio.
Quick Overview
| Attribute | Details |
|---|---|
| Type | Search APIs |
| Best For | AI agent developers |
| Language/Stack | Rust (Axum + Tokio) |
| License | CC BY-NC 4.0 |
| GitHub Stars | 63 as of Feb 2025 |
| Pricing | Open-Source |
| Last Release | 5f0f907 — Feb 2025 |
Who Should Use SoSearch?
- AI agent builders integrating search into Gemini CLI or custom LLMs who require SerpAPI-like JSON without $50+/month costs.
- Indie hackers prototyping RAG pipelines that need DuckDuckGo results normalized to structs like SearchResultItem in under 200ms.
- Rust backend teams deploying lightweight search proxies with Docker support for ARM64 Linux via native runners.
Not ideal for:
- Enterprise compliance teams needing audited, rate-limited APIs with SLAs, as scraping risks IP blocks.
- High-volume production search (10k+ QPS) without custom proxy rotation, given single-instance limits.
- Non-technical users expecting a managed service, since setup requires Cargo and TLS config tweaks.
Key Features of SoSearch
- Concurrent Engine Scraping — Dispatches Tokio tasks to DuckDuckGo, Yahoo (Bing-backed), and Brave simultaneously via rquest HTTP2 client, merging results into a single SearchResponse in 150-300ms average.
- TLS Fingerprint Impersonation — Simulates Chrome 124 JA3 fingerprint and HTTP2 settings to evade bot detection, achieving 95% success rate on DuckDuckGo vs 60% with reqwest defaults.
- Standardized JSON Output — Parses HTML with scraper crate CSS selectors into structs: title, url, snippet, with deduping by domain hash; compatible with Pydantic models or TypeScript interfaces.
- MCP Server Mode — Runs
--mcpflag for JSON-RPC 2.0 over stdio, exposingsearchmethod for AI agents; integrates with .gemini and .agents skills like sosearch-engine-dev. - Docker Multi-Arch Builds — Custom Dockerfile for linux-arm64 without cross-compilation, using GitHub Actions native runners; docker-compose.yml spins up API on port 3000.
- Offline Debugging Tools — examples/fetch_html.rs downloads raw responses; test_parser.rs iterates selectors for engine-specific tweaks.
- Agent Skills Integration — Pre-configured .gemini/settings.json and .agents/skills for workflows like scraper dev and API ops, with GEMINI.md system prompt.
SoSearch vs Alternatives
| Tool | Best For | Key Differentiator | Pricing |
|---|---|---|---|
| SoSearch | AI agent stdio integration | Free Rust scraper with MCP/JSON-RPC | Open-Source |
| SerpAPI | Production-grade reliability | Official API with 100+ engines, caching | Paid ($50+/mo) |
| Tavily | LLM-optimized search | RAG-focused ranking, no hallucinations | Freemium ($5/1k queries) |
| epstein-search | Niche query handling | Custom indexing for edge cases | Open-Source |
SerpAPI suits teams needing 99.9% uptime and Bing/Google access but charges per 1k results; switch if scraping fails exceed 5%. Tavily excels in answer extraction for RAG but limits free tier to 1k queries/month—use SoSearch for unlimited dev testing. For specialized searches, epstein-search handles long-tail better, though lacks MCP.
How SoSearch Works
SoSearch's core is a trait-based SearchEngine enum in src/engines/mod.rs dispatching to duckduckgo.rs, yahoo.rs, and brave.rs. Each impl pulls HTML via rquest::Request with Chrome TLS params, then scraper::Html::parse_document extracts nodes via selectors like "h2 a[href]" for titles/URLs. Results aggregate in search.rs under Tokio::spawn for 3x concurrency, normalized to models::SearchResultItem { title: String, url: Url, snippet: String }, deduped by hashing url.domain().
The Axum server in main.rs exposes POST /search accepting {query: String, num_results: usize=10}, returning SearchResponse(200) or error(429) on rate limits. MCP mode forks a JSON-RPC listener on stdin/stdout, handling "2.0" id/method/params per spec.
Agent skills in .gemini/skills/sosearch-engine-dev invoke cargo run -- test_parser.rs for selector iteration, feeding GEMINI.md prompts like "Debug DuckDuckGo v3 layout changes."
# Clone and build
git clone https://github.com/NetLops/SoSearch.git
cd SoSearch
cargo build --release
# Run HTTP API
./target/release/sosearch --port 3000
# Test query
curl -X POST http://localhost:3000/search \
-H 'Content-Type: application/json' \
-d '{"query": "rust tokio", "engines": ["duckduckgo", "brave"] }'
# MCP for agents
./target/release/sosearch --mcp
This outputs JSON like {"results": [{ "title": "Tokio RS", "url": "https://tokio.rs", "snippet": "..." }]}, with engines param filtering dispatch. Expect 200-400ms on i7, scale via --workers 16 flag inferred from Tokio pool.
Pros and Cons of SoSearch
Pros:
- Zero cost replaces $50/mo SerpAPI for dev, scraping 3 engines at 300ms latency on M1 Mac.
- MCP JSON-RPC enables drop-in AI agent use, with .agents skills for 80% automation of scraper maintenance.
- Rust safety prevents memory leaks in long-running servers; Axum traces handle 10k req/min.
- Multi-arch Docker deploys to AWS Graviton2 at 20% lower cost vs x86.
- Modular engines trait allows adding Perplexity.rs in 50 LOC.
- High bot evasion: rquest Chrome impersonation sustains 500 queries/day per IP.
Cons:
- Non-commercial CC BY-NC 4.0 license blocks SaaS monetization without relicensing.
- No built-in proxy rotation; blocks after 1k queries require Tor or residential IPs.
- Engine-specific fragility: Yahoo layout changes break 20% parses until test_parser.rs fix.
- ARM64 CI uses native runners post-cross drop, but Windows deps need PowerShell v7+.
- Lacks image/video results; text-only limits RAG diversity vs SerpAPI.
Getting Started with SoSearch
Prerequisites: Rust 1.75+, Docker for prod. On Linux/Mac, cargo install works natively; Windows uses PowerShell for deps like Visual Studio Build Tools.
# Via Makefile (preferred)
make build
make run-api # Binds :3000
# Docker
docker build -t sosearch .
docker-compose up # Exposes API + volume for logs
# MCP test with curl simulating agent
echo '{"jsonrpc":"2.0","id":1,"method":"search","params":{"query":"axum rust"}}' | ./target/release/sosearch --mcp
Post-run, API responds at http://localhost:3000/search with OpenAPI-like JSON. Edit src/engines/duckduckgo.rs selectors for custom fields like pubDate. Configure .gemini/settings.json API key for skills; first MCP call registers sosearch-api-ops skill automatically.
Verdict
SoSearch is the strongest option for AI agent developers prototyping RAG without API bills when scraping tolerance exceeds 90% success. Its Tokio concurrency and MCP stdio beat curl scripts by 5x speed, though proxy needs limit scale. Deploy it for dev pipelines today via Cargo.



