Harness Kit

Reference

Generated catalog of local skills, agents, gates, and bootstrap behavior.

This section is rebuilt from source files so the public documentation cannot drift silently from the harness.

skill

a11y

/a11y

Accessibility audit, remediation, and verification. WCAG 2.2 AA compliance. Three-agent protocol: audit (find issues) → remediate (fix them) → critique (verify fixes). Use when: "accessibility audit", "a11y", "WCAG", "screen reader", "keyboard navigation", "contrast check", "aria fix", "accessibility sprint", "audit accessibility", "fix accessibility", "a11y issues", "a11y check". Trigger: /a11y

Reference
skill

agent-readiness

/agent-readiness

Assess and improve codebase readiness for AI coding agents across style, tests, docs, architecture, CI, observability, security, and dev setup. Produces a scored report, prioritizes remediation, then executes the highest-impact fixes. Use when: "agent readiness", "is this codebase agent-ready", "readiness report", "make this codebase agent-friendly", "agent-ready assessment", "readiness audit", "prepare for agents". Trigger: /agent-readiness, /readiness.

Reference
skill

agent-transcript

/agent-transcript

Redact and package local agent-session excerpts for PRs, issues, receipts, or review evidence. Use when: "add agent transcript", "attach session proof", "show agent provenance", "redact transcript", "PR transcript". Trigger: /agent-transcript.

Reference
skill

browser

/browser

Pick browser automation for web/Electron: CI E2E, scripted flows, scraping, visual regression, exploratory QA, persona walks, monitoring, or browser agents. Deterministic Playwright first; harden exploratory findings into repeatable tests. Use for "automate the browser", "test this web app", "test this electron app", "Playwright or Stagehand", "scrape this site", "browser agent", "visual regression", "E2E tests". Trigger: /browser.

Reference
skill

ci

/ci

Audit CI gates, strengthen weak coverage, then drive green. Dagger owns the canonical pipeline; missing Dagger is auto-scaffolded. Acts directly on mechanical fixes and never returns red without structured diagnosis. Use when: "run ci", "check ci", "fix ci", "audit ci", "is ci passing", "run the gates", "dagger check", "why is ci failing", "strengthen ci", "tighten ci", "ci is red", "gates failing". Trigger: /ci, /gates.

Reference
skill

code-review

/code-review

Parallel multi-agent code review. Launch reviewer team, synthesize findings, auto-fix blocking issues, loop until clean. Use when: "review this", "code review", "is this ready to ship", "check this code", "review my changes", "autoreview", "Codex review", "Claude review", "second-model review". Trigger: /code-review, /review, /autoreview.

Reference
skill

create-repo-skill

/create-repo-skill

Generate a repository-local skill from live repo discovery and user intent. Use when: "create QA skill", "generate repo skill", "make a local skill", "persona acceptance skill", "value proposition QA", "bespoke repo QA", "scaffold local skill". Creates concrete `.agents/skills/<name>/` guidance with harness-specific bridges when useful. Trigger: /create-repo-skill, /repo-skill, /create-qa-skill.

Reference
skill

critique

/critique

Run one targeted, read-only architecture or quality critique through a named lens from the shared rubric. Use when: "critique this module", "run an Ousterhout pass", "lens critique", "architecture critique". Trigger: /critique.

Reference
skill

debrief

/debrief

Lightweight evidence-backed retro and catch-up reports for a current repo, branch, PR, backlog slice, or recent agent session. Use when the user asks for a debrief, catch me up, what changed, why it matters, product implications, end-user implications, developer experience implications, current app state, backlog state, workspace state, alternatives considered, or context rebuild after losing the thread. Trigger: /debrief.

Reference
skill

deliver

/deliver

Inner-loop composer for one backlog item to merge-ready code. Composes /shape, /implement, /code-review, /ci, /refactor, and /qa; stops before push, merge, or deploy. Emits receipt.json plus operator brief + /reflect. Use for "deliver this", "make it merge-ready", shaped-ticket builds, and `--polish-only <branch|PR>` for existing branch/PR cleanup. Trigger: /deliver.

Reference
skill

demo

/demo

Capture evidence for what changed: blurb, paste, screenshot, GIF, video, launch note, or repo-local demo skill. Pick by change shape, audience, and budget. Use when: "make a demo", "record walkthrough", "PR evidence", "upload screenshots", "show what changed", "release notes blurb", "scaffold demo", "generate demo skill". Trigger: /demo.

Reference
skill

deploy

/deploy

Ship merged code to one deploy target. Thin router: detect target, run the platform recipe, capture receipt (sha, version, URL, rollback handle), stop when healthy. Does not monitor, triage, or decide whether to deploy. Use when: "deploy", "deploy to prod", "release", "push to staging", "deploy this branch", "release cut". Trigger: /deploy, /release.

Reference
skill

deps

/deps

Analyze, test, and upgrade dependencies. One curated PR, not 47 version bumps. Reachability analysis, behavioral diffs, risk assessment. Package-manager agnostic. Use when: "upgrade deps", "dependency audit", "check for updates", "outdated packages", "security audit deps", "update dependencies", "vulnerable dependencies", "deps". Trigger: /deps.

Reference
skill

design

/design

Artifact-backed interface critique and polish for hierarchy, typography, layout, density, IA, interaction feel, content, brand fit, and taste. Requires screenshot, URL, rendered artifact, or explicit file plus intent. Use when: "make this look better", "improve the design", "polish the UI", "critique this screen", "design pass", "art direction", "scaffold design", docs layout, report polish, generated diagrams/images, screenshots, decks, dashboards, charts, or any product-facing visual artifact. Trigger: /design.

Reference
skill

diagnose

/diagnose

Investigate, audit, triage, and fix. Systematic debugging, incident lifecycle, domain auditing, and issue logging. Feedback-loop-first protocol: reproduce or replay before root cause, pattern analysis, hypothesis test, and fix. Use for: any bug, test failure, production incident, error spikes, audit, triage, postmortem, "diagnose", "why is this broken", "debug this", "production down", "is production ok", "audit stripe", "log issues". Trigger: /diagnose.

Reference
skill

flywheel

/flywheel

Outer-loop shipping orchestrator. Composes /shape, /implement, /yeet, /deliver --polish-only, /ship, and /monitor per backlog item. Closure (archive, reflect, harness routing) lives in /ship; flywheel does not invoke /reflect directly. Use when: "flywheel", "run the outer loop", "next N items", "overnight queue", "cycle". Trigger: /flywheel.

Reference
skill

groom

/groom

Always-on backlog grooming. Tidy, brainstorm, interrogate, investigate, research, and simplify in a single loop. Tidy is not a mode — it happens every time. Strategic-layer work fans out parallel interrogation, design-critique, technical-review, and research lanes. Use when: "groom", "what should we build", "rethink this", "biggest opportunity", "backlog", "prioritize", "backlog session", "audit skills", "skill quality audit". Trigger: /groom, /groom audit, /backlog, /rethink, /moonshot, /scaffold.

Reference
skill

hardening

/hardening

Agent-driven test hardening for functions, specs, and acceptance surfaces: property testing, mutation testing, acceptance mutation, CRAP/SCRAP risk, and structural DRY analysis. Use when: "property test this", "mutation test", "CRAP analysis", "DRY analysis", "harden tests", "find weak tests", "kill surviving mutants", "acceptance mutation", "uncle bob hardening". Trigger: /hardening.

Reference
skill

harness-engineering

/harness-engineering

Harness engineering for Harness Kit primitives: skills, shared doctrine, provider roster, harness configs, gates, evals, bootstrap, and sync logic. Use for "improve the harness", "harness engineering", "bootstrap is wrong", "AGENTS.md is stale", "skill health", "skill usage", "undertriggering skill", "description tax", "eval skill", "sync primitives", "roster defaults". Trigger: /harness-engineering, /harness, /skill.

Reference
skill

implement

/implement

Atomic TDD build skill. Takes a context packet (shaped ticket) and produces code + tests on a feature branch. Red → Green → Refactor. Does not shape, review, QA, or ship — single concern: spec to green tests. Use when: "implement this spec", "build this", "TDD this", "code this up", "write the code for this ticket", after /shape has produced a context packet. Trigger: /implement, /build (alias).

Reference
skill

karpathy-guidelines

/karpathy-guidelines

Four LLM-agent guardrails: surface assumptions, prefer simplicity, make surgical changes, and drive by verifiable goals. Reference for scope, simplicity, assumptions, or success-criteria judgment calls. Use when: "am I overcomplicating this", "what are my assumptions", "how do I verify success", "/karpathy", "/principles". Trigger: /karpathy, /principles.

Reference
skill

model-research

/model-research

Research, compare, and select LLM models for AI-powered apps and workflows. Finds latest models per family, verifies availability on target platform, compares pricing/benchmarks/tool-calling, produces ranked recommendations. Use when: "which model", "compare models", "find a model", "model research", "best model for", "cheapest model", "tool calling models", "model selection", "upgrade model", "swap model", "fallback models", "model chain". Trigger: /model-research.

Reference
skill

monitor

/monitor

Watch signals after deploy, release, local run, CI change, or repeated workflow. Uses telemetry when present; otherwise healthchecks, logs, CI, flaky tests, benchmark drift, daemon output, regressions, or agent traces. Emits events, escalates to /diagnose on trip, closes on green. Use when: "monitor signals", "watch the deploy", "watch production", "watch CI", "watch logs", "watch benchmark drift", "is it healthy". Trigger: /monitor.

Reference
skill

qa

/qa

Verify the running thing works. Browser walks for web, request replay for APIs, shell smoke for CLIs, consumer builds for libraries, tool-call replay for MCP. "Tests pass" is not QA. Use when: "run QA", "verify the feature", "test this", "check the app", "exploratory test", "QA this PR", "smoke test", "manual testing", "capture evidence". For generated repo QA skills, use /create-repo-skill qa. Trigger: /qa.

Reference
skill

refactor

/refactor

Branch-aware simplification. On feature branches, compare against base and simplify the diff before merge. On primary branch, scan for the highest impact deletion, consolidation, or boundary improvement. Use when: "refactor this", "simplify this diff", "clean this up", "reduce complexity", "pay down tech debt", "make this easier to maintain", "make this more elegant", "reduce the number of states", "clarify naming". Trigger: /refactor.

Reference
skill

reflect

/reflect

Session retrospective, operator coaching, harness postmortem, codification, and outer-loop cycle critique. Turns evidence into hooks, rules, skills, backlog mutations, or explicit non-actions. Use when: "done", "wrap up", "what did we learn", "retro", "calibrate", "prompt better", "teach me from this session", "reflect on cycle", post-/flywheel critique. Trigger: /reflect, /retro, /calibrate, /reflect checkpoint <topic>, /reflect cycle <cycle-ulid>.

Reference
skill

research

/research

Web research, multi-AI delegation, and multi-perspective validation. /research [query], /research delegate [task], /research thinktank [topic]. Use when: "search for", "look up", "research", "delegate", "get perspectives", "web search", "find out", "investigate", "introspect", "session analysis", "check readwise", "saved articles", "reading list", "highlights", "what are people saying", "X search", "social sentiment", "trending". Triggers: "search for", "look up", "research", "delegate", "get perspectives", "web search", "find out", "investigate", "introspect", "session analysis", "check readwise", "saved articles", "reading list", "highlights", "what are people saying", "X search", "social sentiment", "trending".

Reference
skill

settle

/settle

DEPRECATED redirect. /settle, /pr-fix, and /pr-polish now route to /deliver --polish-only <branch|PR> — the single owner of "existing branch -> merge-ready" (backlog 080 collapsed settle into deliver). Slated for deletion next release. Use when: muscle-memory "polish this", "address PR reviews", "get this merge-ready" — then run /deliver --polish-only. Trigger: /settle, /pr-fix, /pr-polish.

Reference
skill

shape

/shape

Shape a raw idea into something buildable. Product + technical exploration. Spec, design, critique, plan. Output is a context packet. Use when: "shape this", "write a spec", "design this feature", "plan this", "spec out", "context packet", "technical design". Trigger: /shape, /spec, /plan, /cp.

Reference
skill

ship

/ship

Final mile from merge-ready branch to shipped: squash-merge, archive backlog tickets with trailers, update touched docs, run /reflect, apply outputs. Assumes /deliver or /deliver --polish-only already made the branch ready. Use when: "ship it", "merge and close out", "final mile", "land and reflect", "finish this ticket". Trigger: /ship.

Reference
skill

skillify

/skillify

Turn proven agent-session patterns into first-party Harness Kit skills. Use when: "skillify this conversation", "make this into a skill", "generate a skill from current transcript", "extract reusable workflow". Trigger: /skillify.

Reference
skill

trace

/trace

Capture agent-session work records as local JSONL audit evidence. Links a backlog/spec, branch, commits, review verdicts, QA/demo evidence, transcript refs, and shipped ref without storing raw private transcripts. Use when: "trace this work", "write work record", "agent session trace", "journal this delivery", "link transcript evidence". Trigger: /trace, /journal.

Reference
skill

yeet

/yeet

End-to-end "ship it to the remote": read worktree, classify changes, tidy debris, split semantic commits, and push. Judgment layer over git, not a wrapper; decides what belongs and how reviewers should read the diff. Use when: "yeet", "yeet this", "commit and push", "ship local changes", "tidy and commit", "wrap this up and push", "get this off my machine". Trigger: /yeet, /ship-local (alias).

Reference
agent

a11y-auditor

a11y-auditor

Finds accessibility issues. Does NOT fix them. Read-only investigation.

Reference
agent

a11y-critic

a11y-critic

Verifies accessibility fixes. Skeptical by default. Tests keyboard, axe, screen reader semantics.

Reference
agent

a11y-fixer

a11y-fixer

Fixes accessibility issues from audit findings. Minimal surgical changes. Native HTML over ARIA.

Reference

CI gate map

32 gates enforced by Dagger.

Open

Bootstrap behavior

How Harness Kit installs skills, agents, and roster config.

Open

Agent-readable manifest

Compact generated context for coding agents.

Open