Evals
Define what good looks like before a model run becomes production behavior.
Governance
Operational agent work needs evals, traces, approvals, least-privilege tools, rollback, and escalation paths before it earns trust.
Define what good looks like before a model run becomes production behavior.
Record provider attempts, failures, lead verdicts, and accepted synthesis without exposing raw transcripts.
Put humans at the points where policy, money, reputation, or irreversible actions enter the workflow.
Keep tool permissions explicit and narrow enough for the workflow's actual needs.
Know which commit, config, or harness change to revert when behavior regresses.
Name the condition that moves work from agent automation back to human review.
Sources: .harness-kit/agents.yaml, docs/positioning.md