Most teams fail their first agent hire for the same reason most teams fail their first engineering hire: they pick the person (or the model) before they pick the problem. This guide is the sequence we use internally and recommend to every new customer. It takes about 30 minutes to work through, and it has a surprisingly high hit rate.
Why your first hire is different
Your first agent teaches your team how to work with agents. The work it produces matters, but the habits it builds matter more — the review cadence, the coaching loop, the muscle of handing things off. Pick poorly and you'll spend your energy arguing about agents in general instead of improving this specific one.
Three rules for the first hire:
- Scope narrow.One role, one workflow, one success metric. “Handle inbound sales email triage for inbounds under $10k ACV” beats “do sales.”
- Pick work you already understand.You need to be able to look at an output and know within 60 seconds whether it's good. If you can't, don't delegate it yet.
- Choose someone who has done this before as your reviewer.The reviewer's taste becomes the agent's taste.
Pick the role (not the tech)
A role is “what this teammate does.” A tech is “what model or tools it uses.” Only the role matters right now — the tech is our problem.
Walk through these questions, in order:
- What work, if done by a competent junior, would free up your most expensive person this week?
- Is that work repeatable — does it happen more than 5 times a week?
- Is it reviewable — can a human grade the output in under 5 minutes?
- Is it scoped — does it live in one or two tools, not eleven?
If you can answer yes to all four, you've found the role. The most common good answers: customer support triage, outbound email drafts, code review, weekly reporting, meeting notes and action items, social post drafts, first-pass design exploration.
Write the spec
The spec is your job description. It should be one page, max. Here's the template we use:
# Role: Inbound Sales Triage Agent ## Goal Respond to inbound leads within 10 minutes, qualify them against our ICP, and either book a meeting or route to humans. ## Success conditions - 95% of inbounds replied to within 10 min - ICP-fit leads book a meeting at >40% rate - Zero replies to obvious spam or wrong-audience ## Constraints - Never offer discounts or pricing commitments - Never mention our roadmap - Escalate to @sarah for any deal > $25k ACV ## Tools - Gmail (read + send) - HubSpot (read contacts, write meeting bookings) - Calendar (read availability, create events) ## Review cadence - All replies reviewed by @alex for first 2 weeks - After 2 weeks: sampled review, 1 in 10 - Weekly 15-min sync to coach
The “Success conditions” section is the one most teams skip. Don't. If you don't know what good looks like, the agent won't either, and you'll spend six weeks arguing about vibes.
Set guardrails before tools
The order matters. Most people grant tool access first, then try to write prompts that prevent misuse. That's backwards. Set hard guardrails at the tool layer first — these are rules the agent cannot break even if it wanted to — and only then grant the minimum tools needed.
Good guardrails for a first agent:
- Dollar ceiling: no actions that commit more than $X without human approval.
- Scope of action: read-only on everything except the specific write-APIs you listed in the spec.
- Rate limits: max N actions per hour, max M per day. Catches runaway loops.
- Audience rules: allowlist of customer/prospect segments it can contact.
Review the first 10 outputs
This is the single highest-leverage thing you will do. The first 10 outputs are how the agent learns your taste, your edge cases, your voice. Review them end-to-end, personally, with your full attention. Yes, even if you're the CEO.
For each output, do three things:
- Decide: ship, revise, or reject.
- If revise or reject: write one sentence explaining why, in plain language. “This is too formal for our brand.” “This commits us to a deadline we can't hit.” “This answers the wrong question.”
- Attach that sentence as a coaching note.
Coach, don't correct
The temptation after the first bad output is to rewrite the system prompt. Resist it. Every prompt edit is a global change that affects work you haven't seen yet. Instead, use coaching notes — they're scoped, searchable, and reversible.
Coach like you'd coach a new hire: specific, behavioral, and forward-looking. “This reply missed that the prospect asked twice about integration with Snowflake — always scan for repeated questions, they signal what matters most” is a coaching note. “Be more careful” is not.
When to hire #2
Don't hire a second agent until the first one has run unattended for two weeks with sampled review and no serious incidents. Resist the urge to parallelize. Your review capacity, not agent capacity, is the bottleneck — and you only have one review queue to train.
When the first one is stable, the second is easier: same spec format, same review habits, same coaching discipline. By your fifth hire you'll have built the muscle to onboard in a day. But not yet. Get the first one right.


