Agent Research Aggregator

Name: Agent Research Aggregator
Author: Ar9av

By Ar9av· Ar9av/PaperOrchestra· 0

Extract experiments from AI agent logs and prepare them for academic paper writing.

Installation

1
Make sure Claude is on your device and in your terminal.
Skills load from ~/.claude/skills/ when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then run claude in any terminal to verify.
One-time setup
```
npm i -g @anthropic-ai/claude-code
```
Already have it? Skip ahead.

Paste into Claude Code or into your terminal.

Install

git clone ht••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••• •• ••••• •• •••••••••••••••••••••••••••••••••••••••••••••••• •• •• •• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••

This copies the whole skill folder into ~/.claude/skills/agent-research-aggregator-ar9av/ — the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.

Faster alternative (instruction-only skills)

Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.

Quick install (SKILL.md only)

mkdir -p ~/.••••••••••••••••••••••••••••••••••••••••••••• •• •••• ••••• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Restart Claude Code.
Quit and reopen Claude Code (or any other agent that loads from ~/.claude/skills/). New skills are picked up on startup.
Just ask Claude.
Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.

Prefer to read the source first? Open on GitHub.

When Claude uses it

Pre-pipeline aggregator that scans AI agent cache directories (.claude, .cursor, .antigravity, .openclaw) or any user-specified directory for experimentation logs, extracts insights and numeric results, and formats them as PaperOrchestra-ready inputs (idea.md + experimental_log.md). TRIGGER when the user says "aggregate my agent logs for paper writing", "extract experiments from my coding agent history", "prepare PaperOrchestra inputs from my cache", "turn my agent logs into a paper", mentions a folder or directory they want to use as the basis for a paper, or wants to run PaperOrchestra but only has scattered agent experiment histories rather than structured inputs. Run this BEFORE paper-orchestra. Also called automatically by paper-orchestra when workspace/inputs/idea.md or workspace/inputs/experimental_log.md are missing.

What this skill does

agent-research-aggregator

Should I run? (decision gate)

Before starting Phase 1, check whether aggregation is actually needed:

Situation	Action
`workspace/inputs/idea.md` and `workspace/inputs/experimental_log.md` both exist and are non-empty	Skip this skill entirely. Proceed directly to `paper-orchestra`.
Either file is missing or empty, and the user provided a directory path	Run this skill with that directory as `--search-roots`.
Either file is missing or empty, and no directory was provided	Scan cwd and `~` by default; show the discovery summary to the user before continuing.
The inputs exist but look thin (e.g. idea.md has < 5 lines, no numeric data in experimental_log.md)	Ask the user whether to supplement with aggregation or proceed as-is.

The skill is intentionally a pre-pass — it is cheap to skip and should only run when the structured inputs don't already exist.

A pre-processing skill for PaperOrchestra (arXiv:2604.05018). Reads scattered experimentation artifacts from AI coding-agent cache directories and synthesizes them into the structured (I, E) input pair the PaperOrchestra pipeline expects.

[.claude/]  [.cursor/]  [.antigravity/]  [.openclaw/]
      │            │              │               │
      └────────────┴──────────────┴───────────────┘
                          │
                    Phase 1: Discovery
                  (discover_logs.py)
                          │
                    discovered_logs.json
                          │
                    Phase 2: Extraction
                  (LLM call per log batch)
                          │
                    raw_experiments.json
                          │
                    Phase 3: Synthesis
                  (LLM call — consolidate)
                          │
                    synthesis.json
                          │
                    Phase 4: Formatting
                  (format_po_inputs.py)
                          │
             ┌────────────┴────────────┐
      workspace/inputs/         workspace/ara/
        idea.md                   aggregation_report.md
        experimental_log.md       discovered_logs.json
                                  raw_experiments.json
                                  synthesis.json

The output drops directly into workspace/inputs/ so the user can immediately run paper-orchestra on the same workspace.

Inputs

Parameter	Required	Default	Description
`--search-roots`	no	cwd, `~`	Comma-separated directories to scan for agent caches
`--agents`	no	all	Comma-separated subset: `claude,cursor,antigravity,openclaw`
`--workspace`	no	`./workspace`	PaperOrchestra workspace root
`--depth`	no	4	Max directory scan depth (prevents runaway scans on large home dirs)
`--since`	no	none	Only include logs modified after this date (ISO 8601: `2025-01-01`)

The user specifies these when invoking the skill, or you may ask them for --search-roots if the current directory has no detectable agent caches.

Phase 1 — Discovery (deterministic)

Run the discovery script to catalog every relevant log file:

python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots <roots> \
    --agents <agents> \
    --depth <depth> \
    --since <since> \
    --out workspace/ara/discovered_logs.json

The script exits with code 2 when no --project filter is set (this is expected on the first run). It prints a "Projects found" list to stdout — show it to the user immediately.

If no logs are found at all: stop and ask the user to specify --search-roots or point you at a directory that contains agent cache folders.

Phase 1.5 — Project Selection (mandatory)

A paper can only be written from a single project. You must ask the user which project to use before any LLM processing begins.

Display the numbered project list from the discovery summary, e.g.:

Projects found:
  [1] /home/alice/projects/my-rl-experiment  (42 files)
  [2] /home/alice/projects/llm-eval-suite    (17 files)
  [3] /home/alice/projects/old-demo          (3 files)

Ask: "Which project should this paper be based on? Please choose a number or paste the project path."
Do not proceed to Phase 2 until the user has answered.
Re-run discovery with the chosen project to filter the manifest:

python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots <roots> \
    --agents <agents> \
    --depth <depth> \
    --since <since> \
    --project "<chosen project path>" \
    --out workspace/ara/discovered_logs.json

This overwrites discovered_logs.json so only the selected project's files remain. The script exits 0 on success.

If the discovery finds only one project: skip the question and inform the user: "Only one project found: <path>. Using it for the paper." — then re-run with --project automatically.

If the discovery summary shows irrelevant files after filtering: ask the user whether to include or exclude them before continuing to Phase 2. Err on the side of inclusion — the extraction prompt is conservative.

Phase 2 — Extraction (LLM-assisted)

Process discovered logs in batches (group by agent type; keep batches under ~50 KB of raw text to stay within context limits):

For each batch:

Read the log files in the batch (the script's --list output tells you which file paths to read).
Apply the extraction prompt from references/extraction-prompt.md as your system message.
Pass the raw log text as the user message.
Collect the structured JSON the LLM returns (see schema in the prompt).
Append to workspace/ara/raw_experiments.json.

After all batches:

python skills/agent-research-aggregator/scripts/extract_experiments.py \
    --discovered workspace/ara/discovered_logs.json \
    --out workspace/ara/raw_experiments.json \
    --validate-only

Run this in --validate-only mode to check the combined JSON is well-formed and meets the minimum schema (experiments array non-empty, each entry has hypothesis or method or results). Fix any malformed entries before Phase 3.

Phase 3 — Synthesis (LLM-assisted)

Consolidate possibly-redundant experiment records from multiple agent caches into a single coherent research narrative. This is ONE LLM call.

System message: Use references/synthesis-prompt.md verbatim.

User message:

<raw_experiments>
{contents of workspace/ara/raw_experiments.json}
</raw_experiments>

The LLM must return a synthesis.json with keys:

research_question — the overarching question being investigated
hypothesis — the core proposed solution / claim
method_summary — how the approach works (concise, no data leakage)
key_contributions — 2–5 bullet strings
experimental_setup — datasets, metrics, baselines, implementation notes
results_tables — array of {title, headers[], rows[]} markdown-table objects
qualitative_observations — free-form text blocks (what worked, what didn't, failure modes, ablation insights)
iteration_history — ordered list of {iteration_id, change_description, outcome} entries if multiple iterations are detected
open_questions — questions that remain unanswered in the logs

Save to workspace/ara/synthesis.json.

Note: By this point, the user has already selected a single project in Phase 1.5. The synthesis should represent one coherent research thread. If the LLM still surfaces multiple disconnected research questions, flag this as a data quality warning in the audit report (Phase 5) but do not re-ask for project selection — that decision was made earlier.

Phase 4 — Formatting (deterministic)

Convert synthesis.json into PaperOrchestra input files:

python skills/agent-research-aggregator/scripts/format_po_inputs.py \
    --synthesis workspace/ara/synthesis.json \
    --out workspace/inputs/

This generates two files:

`workspace/inputs/idea.md` (Sparse variant)

Follows the PaperOrchestra Sparse Idea format (arXiv:2604.05018, §3.1):

# [Synthesized Research Title]

## Problem
<2–4 sentence problem statement derived from research_question>

## Hypothesis
<hypothesis from synthesis>

## Method
<method_summary from synthesis>

## Key Contributions
<key_contributions as bullet list>

## Open Questions
<open_questions, if any>

`workspace/inputs/experimental_log.md`

Follows the PaperOrchestra Experimental Log format (App. D.3):

## 1. Experimental Setup
<experimental_setup from synthesis, formatted as prose + sub-bullets>

## 2. Raw Numeric Data
<results_tables converted to GitHub-Flavored Markdown tables>

## 3. Qualitative Observations
<qualitative_observations from synthesis>

### Iteration History
<iteration_history as an ordered narrative, if present>

After running the script, review both files with the user:

Read workspace/inputs/idea.md aloud and ask: "Does this accurately capture your research question and method?"
Read the table headers from workspace/inputs/experimental_log.md and ask: "Are these the correct metrics and baselines?"

Revise based on feedback before proceeding to PaperOrchestra.

Phase 5 — Audit Report (deterministic)

python skills/agent-research-aggregator/scripts/format_po_inputs.py \
    --synthesis workspace/ara/synthesis.json \
    --out workspace/inputs/ \
    --report workspace/ara/aggregation_report.md

The --report flag makes the script also write aggregation_report.md, which contains:

Number of agent caches scanned, files read, batches processed
Per-agent breakdown (files found per agent type)
Experiment records extracted (count, date range)
Iterations detected (count, convergence direction)
Data quality warnings (gaps, low-confidence extractions, conflicting numbers)
Files written and their sizes

Show the report to the user. If the data quality section lists warnings, discuss them before running paper-orchestra — garbage in, garbage out.

Handoff to PaperOrchestra

Once the user has confirmed idea.md and experimental_log.md, the workspace is ready for the paper-orchestra pipeline. You still need:

File	Status	Action
`workspace/inputs/idea.md`	✓ generated	user review recommended
`workspace/inputs/experimental_log.md`	✓ generated	user review recommended
`workspace/inputs/template.tex`	MISSING	ask user to provide their conference LaTeX template
`workspace/inputs/conference_guidelines.md`	MISSING	ask user to provide (page limit, deadline, formatting rules)

Tell the user exactly which two files are still needed, then offer to run paper-orchestra once they supply them.

Error handling

Situation	Action
Cache directory does not exist	Skip silently; note in report
File is binary or non-text	Skip; note in report
File > 200 KB	Truncate at 200 KB; note in report with path
LLM extraction returns malformed JSON	Re-prompt once with the parse error appended; if still malformed, log the batch as `status: failed` and continue
Synthesis returns > 1 `research_question`	Log as data quality warning in audit report; do not re-ask for project (was selected in Phase 1.5)
`results_tables` is empty after synthesis	Warn the user — PaperOrchestra's section-writing agent needs numeric data

Hard rules (never violate)

Never write to agent cache directories. This skill is read-only on .claude/, .cursor/, .antigravity/, .openclaw/.
Never include personal information (emails, names, credentials, API keys) in generated idea.md or experimental_log.md. The extraction prompt instructs the LLM to strip PII; double-check before handoff.
Never fabricate results. If a metric appears in only one log with low confidence, mark it [UNVERIFIED] in the table rather than silently including it.
Never proceed past Phase 1 without user confirmation of the discovered file list if the scan found > 50 files.

Quick reference

# Phase 1: discover all projects (exits with code 2 — project selection required)
python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots . ~ --out workspace/ara/discovered_logs.json

# Phase 1.5: re-run with chosen project (exits 0)
python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots . ~ \
    --project "/home/user/projects/my-chosen-project" \
    --out workspace/ara/discovered_logs.json

# ... (Phase 2: LLM extraction calls, see above) ...

python skills/agent-research-aggregator/scripts/extract_experiments.py \
    --discovered workspace/ara/discovered_logs.json \
    --out workspace/ara/raw_experiments.json --validate-only

# ... (Phase 3: LLM synthesis call, see above) ...

python skills/agent-research-aggregator/scripts/format_po_inputs.py \
    --synthesis workspace/ara/synthesis.json \
    --out workspace/inputs/ \
    --report workspace/ara/aggregation_report.md

Related skills

Spreadsheet & Excel Editor

anthropics

Open, edit, and create Excel and CSV files with formulas, formatting, and data cleaning.

OfficialProprietary. LICENSE.txt has complete terms

n8n Architect

EtienneLescot

Create, edit, and validate n8n workflows and automation configurations.

Business Growth Toolkit

alirezarezvani

Manage customer health, predict churn, handle RFPs, and streamline sales operations.

MIT

Revenue Pipeline Analyzer