What Is ARIS?
ARIS, short for Auto-claude-code-research-in-sleep, is an open-source AI research agent built by wanshuiyin that automates ML workflows by running Claude Code experiments overnight. It scores papers, identifies weaknesses, executes experiments, and rewrites narratives autonomously using custom skills. ARIS is one of the best AI research agents for ML researchers, with 1.6k GitHub stars as of March 2026 and 151 forks. Developed in Python with MCP servers, it supports cross-model setups like Claude Code execution paired with external LLMs such as Codex (GPT-5.4), GLM-5, or DeepSeek—no Claude or OpenAI API keys required for alternatives.
Quick Overview
| Attribute | Details |
|---|---|
| Type | AI Research Agents |
| Best For | ML researchers |
| Language/Stack | Claude Code + MCP Servers (Python) |
| License | MIT |
| GitHub Stars | 1.6k as of March 2026 |
| Pricing | Open-Source |
| Last Release | N/A — active commits to main branch |
Who Should Use ARIS?
- ML researchers drafting papers who need automated experiment runs and critique to validate claims overnight, handling arXiv paper downloads and git-synced code execution.
- Solo academics or indie ML hackers iterating on vague ideas into claim-driven proposals via research-refine skills, without managing multi-model API quotas.
- Teams prototyping ML workflows requiring adversarial review to escape local minima in self-play agent setups.
Not ideal for:
- Production ML pipelines needing real-time inference, as ARIS focuses on batch overnight research cycles.
- Developers avoiding Python MCP server setups or lacking SSH access for git sync.
- Users locked into single-model ecosystems without interest in cross-LLM collaboration.
Key Features of ARIS
- Cross-Model Collaboration — Claude Code executes fast research steps while Codex MCP or alternatives (GLM-5, MiniMax-M2.5) provide rigorous critique; supports OpenAI-compatible endpoints for 2000+ free daily calls via ModelScope.
- Autonomous Workflow Orchestration — /idea-discovery integrates research-refine and experiment-plan to transform ideas into roadmapped proposals with git push/pull sync for code execution.
- Git Code Sync — /run-experiment uses
code_sync: gitfor push-to-SSH-pull, enabling seamless repo updates during experiments without rsync dependencies (added March 17, 2026). - arXiv Skill Integration — Downloads and processes papers directly; pairs with narrative_report for Workflow 3 outputs including scored weaknesses and rewritten sections.
- Bring Your Own Model — Alibaba Coding Plan supports Kimi-K2.5, Qwen3.5+, GLM-5, MiniMax-M2.5 via single API key and dual endpoints (March 16, 2026).
- Parameter Pass-Through —
-- key: valueflags propagate settings downstream across workflows for flexible experimentation. - No-API Alternatives — ModelScope (Alt E) offers free access without automation limits, bypassing paid LLM restrictions.
ARIS vs Alternatives
| Tool | Best For | Key Differentiator | Pricing |
|---|---|---|---|
| ARIS | ML researchers automating overnight paper experiments | Cross-model adversarial review via MCP without Claude API | Open-Source |
| Claude Code Canvas | Visual code generation in Claude | Canvas UI for interactive editing, lacks autonomous research loops | Freemium |
| Brainstorm MCP | MCP server brainstorming | Pure MCP focus, no built-in ML experiment orchestration | Open-Source |
| OpenSwarm | Multi-agent swarms | N-player agent coordination, higher overhead than 2-model Nash equilibrium | Open-Source |
Claude Code Canvas suits interactive sessions but misses ARIS's overnight autonomy and cross-LLM critique. Brainstorm MCP provides server primitives yet requires custom skills for full ML research pipelines—ARIS bundles them ready-to-run. OpenSwarm handles complex swarms but incurs coordination costs unsuitable for 2-player adversarial review efficiency.
How ARIS Works
ARIS leverages Claude Code skills as the executor agent, paired with an external Codex MCP server for critique in a 2-player adversarial setup. The core abstraction is workflow orchestration: /idea-discovery feeds into research-refine for problem-anchored proposals, then experiment-plan generates claim-driven roadmaps. Code execution happens via git-synced repos over SSH, storing outputs in narrative_report format with scores and refinements. This design exploits Claude's execution speed against deliberate reviewers like GPT-5.4, avoiding single-model blind spots akin to stochastic bandit predictability.
Architecture centers on Python MCP servers in mcp-servers/ handling LLM routing (OpenAI-compatible APIs). Skills in skills/ define autonomous loops: arXiv downloads trigger weakness probing, experiment runs validate claims. Parameter hierarchies allow top-level overrides, e.g., model selection cascades to sub-workflows.
Realistic start with ModelScope:
# Clone and setup
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep
pip install -r requirements.txt # Assumes Python deps for MCP
# Run idea-discovery with ModelScope (Alt E guide)
./run.sh --workflow idea-discovery --model modelscope --api-key YOUR_KEY "vague ML idea prompt"
This command spins up Claude Code (or alt) to refine the idea into a proposal, plans experiments, and critiques via external LLM. Expect a narrative_report.md with scored sections, git-pulled code changes, and logged API calls—full cycle completes in hours depending on compute.
Pros and Cons of ARIS
Pros:
- Achieves Nash equilibrium-like convergence in 2-model review, outperforming self-play by probing unanticipated weaknesses (e.g., 20-30% better claim validation per community tests).
- Zero-cost alternatives like ModelScope (2000 calls/day) or Alibaba plans eliminate API barriers for 4+ models.
- Git sync enables persistent code evolution across runs, with 248 commits showing active maintenance as of March 2026.
- Modular skills (arXiv, research-refine) integrate into custom workflows without full rewrites.
- Offline-capable critique via local MCP servers reduces latency to sub-5s per review cycle.
Cons:
- Requires SSH setup for git sync, adding 2-5min initial config for non-local execution.
- Python MCP dependencies (e.g., server.py LLM configs) demand Node/Python hybrid env, not pure Docker.
- Limited to 2-player dynamics; scaling to multi-reviewer needs custom extensions.
- arXiv skill downloads cap at paper metadata—full PDF parsing needs tools/ helpers tweaks.
- No built-in visualization; outputs text reports requiring manual plotting for experiment results.
Getting Started with ARIS
Start by cloning the repo and installing Python dependencies for MCP servers.
# Install
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep
pip install -r requirements.txt # Includes MCP libs, assumes Python 3.11+
# Quick test: Workflow 1 with free ModelScope
mcp-server start --model modelscope --api-key sk-your-key # Alt E guide
./run.sh --workflow idea-discovery --code-sync git --ssh-host your-server "Test hypothesis on transformer efficiency"
Post-run, ARIS generates experiment-plan.md with claims, pulls code via git, executes via Claude Code, and delivers narrative_report.md scored 1-10 on weaknesses. Configure ssh-agent for pull auth; initial sync takes 10-30s. Tweak --key:value for models like deepseek. Community CONTRIBUTING.md guides extensions.
Verdict
ARIS delivers the strongest overnight ML research automation for solo researchers when cross-model critique breaks self-play limits. Its git-synced execution and free model support (1.6k stars, March 2026) excel for paper iteration. Caveat: SSH setup adds friction—pair with Claude Context Mode for hybrid interactive use. Deploy for any arXiv-driven workflow today.



