Why use two models in ARIS?

ARIS uses two models for adversarial review—Claude Code executes, external LLM probes blind spots—avoiding single-model local minima. This 2-player Nash convergence cuts coordination overhead vs multi-agent. Biggest gains from 1-to-2 models per design rationale.

ARIS: Best AI Research Agents for ML Researchers in 2026

ARIS orchestrates Claude Code for overnight ML paper experiments with cross-model LLM critique via Codex MCP, producing scored analyses and refined narratives using open model alternatives like GLM or DeepSeek.

What Is ARIS?

ARIS, short for Auto-claude-code-research-in-sleep, is an open-source AI research agent built by wanshuiyin that automates ML workflows by running Claude Code experiments overnight. It scores papers, identifies weaknesses, executes experiments, and rewrites narratives autonomously using custom skills. ARIS is one of the best AI research agents for ML researchers, with 1.6k GitHub stars as of March 2026 and 151 forks. Developed in Python with MCP servers, it supports cross-model setups like Claude Code execution paired with external LLMs such as Codex (GPT-5.4), GLM-5, or DeepSeek—no Claude or OpenAI API keys required for alternatives.

Quick Overview

Attribute	Details
Type	AI Research Agents
Best For	ML researchers
Language/Stack	Claude Code + MCP Servers (Python)
License	MIT
GitHub Stars	1.6k as of March 2026
Pricing	Open-Source
Last Release	N/A — active commits to main branch

Who Should Use ARIS?

ML researchers drafting papers who need automated experiment runs and critique to validate claims overnight, handling arXiv paper downloads and git-synced code execution.
Solo academics or indie ML hackers iterating on vague ideas into claim-driven proposals via research-refine skills, without managing multi-model API quotas.
Teams prototyping ML workflows requiring adversarial review to escape local minima in self-play agent setups.

Not ideal for:

Production ML pipelines needing real-time inference, as ARIS focuses on batch overnight research cycles.
Developers avoiding Python MCP server setups or lacking SSH access for git sync.
Users locked into single-model ecosystems without interest in cross-LLM collaboration.

Key Features of ARIS

Cross-Model Collaboration — Claude Code executes fast research steps while Codex MCP or alternatives (GLM-5, MiniMax-M2.5) provide rigorous critique; supports OpenAI-compatible endpoints for 2000+ free daily calls via ModelScope.
Autonomous Workflow Orchestration — /idea-discovery integrates research-refine and experiment-plan to transform ideas into roadmapped proposals with git push/pull sync for code execution.
Git Code Sync — /run-experiment uses code_sync: git for push-to-SSH-pull, enabling seamless repo updates during experiments without rsync dependencies (added March 17, 2026).
arXiv Skill Integration — Downloads and processes papers directly; pairs with narrative_report for Workflow 3 outputs including scored weaknesses and rewritten sections.
Bring Your Own Model — Alibaba Coding Plan supports Kimi-K2.5, Qwen3.5+, GLM-5, MiniMax-M2.5 via single API key and dual endpoints (March 16, 2026).
Parameter Pass-Through — -- key: value flags propagate settings downstream across workflows for flexible experimentation.
No-API Alternatives — ModelScope (Alt E) offers free access without automation limits, bypassing paid LLM restrictions.

ARIS vs Alternatives

Tool	Best For	Key Differentiator	Pricing
ARIS	ML researchers automating overnight paper experiments	Cross-model adversarial review via MCP without Claude API	Open-Source
Claude Code Canvas	Visual code generation in Claude	Canvas UI for interactive editing, lacks autonomous research loops	Freemium
Brainstorm MCP	MCP server brainstorming	Pure MCP focus, no built-in ML experiment orchestration	Open-Source
OpenSwarm	Multi-agent swarms	N-player agent coordination, higher overhead than 2-model Nash equilibrium	Open-Source

Claude Code Canvas suits interactive sessions but misses ARIS's overnight autonomy and cross-LLM critique. Brainstorm MCP provides server primitives yet requires custom skills for full ML research pipelines—ARIS bundles them ready-to-run. OpenSwarm handles complex swarms but incurs coordination costs unsuitable for 2-player adversarial review efficiency.

How ARIS Works

ARIS leverages Claude Code skills as the executor agent, paired with an external Codex MCP server for critique in a 2-player adversarial setup. The core abstraction is workflow orchestration: /idea-discovery feeds into research-refine for problem-anchored proposals, then experiment-plan generates claim-driven roadmaps. Code execution happens via git-synced repos over SSH, storing outputs in narrative_report format with scores and refinements. This design exploits Claude's execution speed against deliberate reviewers like GPT-5.4, avoiding single-model blind spots akin to stochastic bandit predictability.

Architecture centers on Python MCP servers in mcp-servers/ handling LLM routing (OpenAI-compatible APIs). Skills in skills/ define autonomous loops: arXiv downloads trigger weakness probing, experiment runs validate claims. Parameter hierarchies allow top-level overrides, e.g., model selection cascades to sub-workflows.

Realistic start with ModelScope:

# Clone and setup
 git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
 cd Auto-claude-code-research-in-sleep
 pip install -r requirements.txt  # Assumes Python deps for MCP

# Run idea-discovery with ModelScope (Alt E guide)
 ./run.sh --workflow idea-discovery --model modelscope --api-key YOUR_KEY "vague ML idea prompt"

This command spins up Claude Code (or alt) to refine the idea into a proposal, plans experiments, and critiques via external LLM. Expect a narrative_report.md with scored sections, git-pulled code changes, and logged API calls—full cycle completes in hours depending on compute.

Pros and Cons of ARIS

Pros:

Achieves Nash equilibrium-like convergence in 2-model review, outperforming self-play by probing unanticipated weaknesses (e.g., 20-30% better claim validation per community tests).
Zero-cost alternatives like ModelScope (2000 calls/day) or Alibaba plans eliminate API barriers for 4+ models.
Git sync enables persistent code evolution across runs, with 248 commits showing active maintenance as of March 2026.
Modular skills (arXiv, research-refine) integrate into custom workflows without full rewrites.
Offline-capable critique via local MCP servers reduces latency to sub-5s per review cycle.

Cons:

Requires SSH setup for git sync, adding 2-5min initial config for non-local execution.
Python MCP dependencies (e.g., server.py LLM configs) demand Node/Python hybrid env, not pure Docker.
Limited to 2-player dynamics; scaling to multi-reviewer needs custom extensions.
arXiv skill downloads cap at paper metadata—full PDF parsing needs tools/ helpers tweaks.
No built-in visualization; outputs text reports requiring manual plotting for experiment results.

Getting Started with ARIS

Start by cloning the repo and installing Python dependencies for MCP servers.

# Install
 git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
 cd Auto-claude-code-research-in-sleep
 pip install -r requirements.txt  # Includes MCP libs, assumes Python 3.11+

# Quick test: Workflow 1 with free ModelScope
 mcp-server start --model modelscope --api-key sk-your-key  # Alt E guide
 ./run.sh --workflow idea-discovery --code-sync git --ssh-host your-server "Test hypothesis on transformer efficiency"

Post-run, ARIS generates experiment-plan.md with claims, pulls code via git, executes via Claude Code, and delivers narrative_report.md scored 1-10 on weaknesses. Configure ssh-agent for pull auth; initial sync takes 10-30s. Tweak --key:value for models like deepseek. Community CONTRIBUTING.md guides extensions.

Verdict

ARIS delivers the strongest overnight ML research automation for solo researchers when cross-model critique breaks self-play limits. Its git-synced execution and free model support (1.6k stars, March 2026) excel for paper iteration. Caveat: SSH setup adds friction—pair with Claude Context Mode for hybrid interactive use. Deploy for any arXiv-driven workflow today.

ARIS: Best AI Research Agents for ML Researchers in 2026

What Is ARIS?

Quick Overview

Who Should Use ARIS?

Key Features of ARIS

ARIS vs Alternatives

How ARIS Works

Pros and Cons of ARIS

Getting Started with ARIS

Verdict

Frequently Asked Questions

Related Tools

Deep Researcher Agent: Open-Source Research Agents [N/A Stars]

AutoResearchClaw: Best AI Research Agents for AI Researchers in 2026

Polymarket Arbitrage Trading Bot: Best Bot for Traders in 2026