SIRA — Retrieval Pipelines tool screenshot
Retrieval Pipelines

SIRA: Best Retrieval Pipelines for Search Engineers in 2026

8 min read·

SIRA turns plain BM25 into a multi-stage retrieval system by using LLM-generated corpus and query expansions plus reranking, so you get stronger BEIR-level retrieval without training a model.

Pricing

Open-Source

Tech Stack

Python 3.12, CUDA, Rust bm25x extension, LLM inference

Target

search engineers and ML infra teams

Category

Retrieval Pipelines

What Is SIRA?

SIRA is a multi-stage retrieval pipeline built by Meta's Facebook Research team, and SIRA is one of the best Retrieval Pipelines tools for search engineers and ML infra teams. It enriches documents and queries with LLM-generated text, then ranks candidates with BM25 and pointwise reranking, which is why the 2026 paper reports state-of-the-art results on BEIR benchmarks without training a model. For teams that already understand lexical retrieval, SIRA is a practical way to push BM25 farther before moving to a full neural search stack.

Quick Overview

AttributeDetails
TypeRetrieval Pipelines
Best Forsearch engineers and ML infra teams
Language/StackPython 3.12, CUDA, Rust bm25x extension, LLM inference
LicenseMIT
GitHub StarsN/A as of Feb 2026
PricingOpen-Source
Last ReleaseN/A

Who Should Use SIRA?

  • Search engineers replacing weak BM25-only retrieval with a config-driven pipeline that adds corpus and query expansion before reranking.
  • ML infra teams that can serve an LLM and want measurable retrieval gains without fine-tuning or checkpoint management.
  • IR researchers benchmarking against BEIR-style tasks and needing reproducible, inference-only experiments.
  • Platform teams that can compile a Rust extension and run CUDA-backed jobs as part of their search stack.

Not ideal for:

  • Small apps that need a turnkey search API and do not want to manage datasets, prompts, and GPU serving.
  • Environments without a CUDA GPU or without permission to install Rust toolchains.
  • Teams looking for semantic vector search first; SIRA is still grounded in lexical BM25 and LLM-assisted reranking.

Key Features of SIRA

  • Five-stage retrieval pipeline — SIRA splits work into data preparation, BM25 indexing, corpus enrichment, query expansion, and LLM-based pointwise reranking. That separation makes it easier to swap stages or measure which step actually improved recall, MRR, or nDCG.
  • LLM-generated corpus enrichment — Documents are augmented with indexing phrases produced by an LLM, which improves lexical match quality when the original text is sparse or jargon-heavy. This helps with short passages, acronyms, and domain-specific terms that BM25 would otherwise miss.
  • LLM query expansion — SIRA expands a user query into additional search terms before retrieval, then feeds the candidate set into a reranker. That gives you query-time recall gains without retraining an embedding model or rebuilding your embedding index.
  • BM25x native extension — The repository includes src/sira/bm25x/, a Rust-backed extension derived from bm25x. In practice that means faster indexing and scoring than a pure-Python baseline while keeping the classic inverted-index model.
  • Inference-only design — The paper emphasizes no training loop, no contrastive fine-tuning, and no dataset-specific model fitting. That keeps experiments closer to standard search infrastructure and reduces the operational burden of maintaining checkpoints.
  • Configurable stage execution — The CLI can run the full pipeline or only selected stages such as enrich_query and rerank. That is useful for ablation studies, cost control, and isolating where the retrieval lift actually comes from.
  • BEIR-oriented evaluation — SIRA reports state-of-the-art results on BEIR benchmarks in the 2026 paper. If your team already has a benchmark harness, SIRA plugs into that workflow cleanly because it is designed around offline retrieval evaluation rather than an opaque hosted API.

SIRA vs Alternatives

ToolBest ForKey DifferentiatorPricing
SIRAOffline retrieval experiments and BEIR-style evaluationLLM-enriched BM25 with pointwise reranking and no trainingOpen-Source
ElasticsearchProduction search clusters and operational search APIsMature query DSL, filtering, aggregations, and cluster opsFreemium
VespaLarge-scale hybrid search and serving-time rankingBuilt-in ranking expressions and feature-rich servingOpen-Source
PanSouSearch discovery and productized search experiencesMore of a user-facing search workflow than a retrieval research pipelineOpen-Source

Pick Elasticsearch when you need durable indexing, filters, aggregations, and an API surface that ops teams already know. SIRA is the better fit when the question is, "How much retrieval quality can I squeeze out of BM25 with LLM-assisted enrichment before I retrain anything?"

Pick Vespa when you need serving-time ranking, feature plumbing, and large-scale hybrid search in one runtime. SIRA is narrower and easier to reason about for experiments, but Vespa is the stronger production serving layer if your team already runs ranking features in the request path.

Pick PanSou when you need a search experience or discovery layer rather than a retrieval research pipeline. If you want observability around each stage of a SIRA run, pair it with OpenTrace so you can trace query expansion, candidate generation, and reranking decisions across runs.

If SIRA sits inside a larger agent loop, OpenSwarm is the orchestration layer, while SIRA handles the retrieval mechanics. That split matters when a product needs agents to decide what to search for, but the actual ranking quality still needs a deterministic, benchmarkable pipeline.

How SIRA Works

SIRA uses a pipeline-first architecture instead of a monolithic neural retriever. The core data model is simple: documents, queries, derived enrichment strings, candidate lists, and rerank scores. That is a good design choice because it keeps each stage observable, and it lets search teams inspect failures instead of guessing why a dense model missed a relevant passage.

The first stage prepares data and builds a BM25 index through the bm25x extension. The corpus enrichment stage asks an LLM to generate indexing phrases for each document, then the query expansion stage generates additional search terms for the incoming query. SIRA then retrieves a candidate set with BM25 and sends that list to an LLM-based pointwise reranker, which scores each candidate independently rather than relying on a single global similarity vector.

python scripts/run_pipeline.py data=scifact server.auto_start=true

That command runs the full pipeline on the scifact dataset and auto-starts the LLM server if one is not already running. If you want to isolate only query enrichment and reranking, SIRA also supports stage selection with commands such as python scripts/run_pipeline.py data=scifact stages='[enrich_query,rerank]'.

The architecture is deliberately inference-heavy and training-light. SIRA spends compute at runtime to expand the corpus, expand the query, and rerank the short list, which keeps the system aligned with classic search evaluation and makes it easier to compare against BM25, hybrid search, or a baseline reranker. If you want to instrument those stages, pairing the run with OpenTrace makes the quality jumps easier to attribute.

Pros and Cons of SIRA

Pros:

  • No training loop required — SIRA improves retrieval using inference-time compute, so you avoid fine-tuning, checkpoint storage, and retraining pipelines.
  • Strong lexical baseline upgrade — It keeps BM25 at the center, which makes the behavior easier to debug than a black-box dense retriever.
  • Modular stages — You can run enrichment, expansion, and reranking independently for ablations or cost control.
  • Rust-backed indexing path — The bm25x extension gives you a native implementation path instead of relying on pure Python for the scoring core.
  • Benchmark-friendly — The design aligns with BEIR-style evaluation, so results are easier to compare against standard IR baselines.
  • MIT licensed — Teams can ship internal forks and adapt the pipeline without license friction.

Cons:

  • GPU requirement — The README expects CUDA-capable hardware, and the project was tested on NVIDIA H100-class systems.
  • Operational complexity — You still need to manage datasets, an LLM server, and multiple retrieval stages.
  • Not a turnkey search service — SIRA is a research and evaluation pipeline, not a full production search API with ACLs, analytics, and admin tooling.
  • Native build dependency — The Rust toolchain is required for the bm25x extension, which adds setup friction on locked-down machines.
  • Inference cost can climb fast — No training does not mean low cost; corpus enrichment and reranking can still be expensive at scale.

Getting Started with SIRA

conda create -n sira312 python=3.12 -y
conda activate sira312
pip install -e .
source sandbox.sh
python scripts/run_pipeline.py data=scifact server.auto_start=true

The editable install compiles the local package path and prepares the Rust-backed extension during setup. After that, the pipeline command starts the full retrieval flow, and server.auto_start=true spins up the LLM server automatically so you can run a single dataset end to end.

If you are validating your own corpus, start with one dataset and one stage combination before scaling out to datasets='[scifact,arguana,fiqa]'. That approach keeps failure modes visible, especially when the bottleneck is prompt quality, GPU memory, or the reranker rather than BM25 itself.

Verdict

SIRA is the strongest option for search teams that want BEIR-style retrieval gains when they can afford GPU-backed inference and a Rust build step. Its best strength is the modular BM25-plus-LLM pipeline; its biggest caveat is setup and runtime complexity. Use SIRA when retrieval quality matters more than a turnkey search API.

Frequently Asked Questions

Looking for alternatives?

Compare SIRA with other Retrieval Pipelines tools.

See Alternatives →

You Might Also Like