9 Questions That Reveal Whether a Developer Actually Does Agentic Engineering
A recruiter asked me how to interview agentic engineers. I wrote down everything I know.
March 2026 · 8 min read
A recruiter friend asked me for 3 questions to evaluate developers on AI skills. I gave him 9. His response: “this is worth gold.” Here's the full guide.
A practical scoring guide for hiring managers. No technical background required. I've been building software exclusively with AI agents since January 2026 — I haven't written a line of code manually since then. These questions are what I'd ask if I were on the other side of the table.
Only have 5 minutes? Ask these 3 questions
What AI coding tool do you use day-to-day? — Tests tool awareness and configuration depth.
Q2What percentage of your code do you write manually vs. generate? — The most honest signal of actual adoption.
Q8Walk me through how you start a new feature with AI. — Reveals workflow maturity and process thinking.
Scoring Framework
Use this to calibrate responses before you start. Most candidates will land at Intermediate. Frontier is rare and worth paying for.
| Level | Description | Typical Signals |
|---|---|---|
| Beginner | Uses AI as autocomplete | Copilot only, writes most code manually, no structured workflow |
| Intermediate | Active AI user | Cursor/Claude Code, ~50% generated, some prompt engineering |
| Advanced | Agentic workflow | 80%+ generated, custom instructions, verification strategy, sub-agents |
| Frontier | Builds the harness | Agent harness engineering, framework usage, team process changes, context management |
Tier 1 — Basics (Any Interviewer Can Ask)
These questions require no technical background. A non-technical recruiter or HR manager can ask them and evaluate the responses with the flags below.
Q1: “What AI coding tool do you use day-to-day?”
Green flags
- Names a specific tool (Cursor, Claude Code, Windsurf, Copilot) and explains why they chose it — ideally with a comparison to alternatives they've tried.
- Describes how they've configured it: custom instructions, CLAUDE.md or rules files, memory features.
- Mentions using multiple tools for different tasks.
- Can speak to what the tool is good and bad at.
Red flags
- Says “ChatGPT” as their primary coding tool — signals copy-paste workflow, not agentic development.
- Gives vague answers like “I use AI tools” without naming anything specific.
- Has never configured the tool beyond default settings.
- Names Copilot but describes using it only for tab completion.
Q2: “What percentage of your code do you write manually vs. generate?”
Green flags
- Advanced candidates say 70–95% generated without hesitation.
- Can explain what the remaining percentage is (boilerplate they still type, config files, things where AI reliably hallucinates).
- Distinguishes between different types of work: business logic vs. scaffolding vs. tests.
- Can explain how the ratio has changed over time and why.
Red flags
- Says less than 30% generated — Beginner/Intermediate signal at best.
- Gives a number without being able to explain the breakdown.
- Says “it depends” and can't give a ballpark.
- Seems uncomfortable with the question — agentic engineers wear this number as a badge.
Q7: “How do you verify AI-generated code? What's your quality gate?”
Green flags
- Has a systematic approach — not just “I read it.”
- Describes specific gates: type checker passes, tests cover the behavior, they test edge cases the AI wouldn't think to test.
- Explains how they catch AI hallucinations (checking external API references, testing boundary conditions).
- Has a mental model for “this output looks right but feels wrong.”
- Uses CI as a safety net, not a primary gate.
Red flags
- Says “I just review the diff” without describing how they review it.
- Relies entirely on the AI to self-verify or self-correct.
- No mention of tests.
- Can't describe a time AI code looked correct but wasn't.
- Treats “it compiled” as a quality gate.
Tier 2 — Intermediate (Tech-Aware Interviewer)
These questions assume the interviewer has basic technical context. They probe workflow depth and team-level thinking — the difference between a good individual practitioner and someone who can uplift a team.
Q5: “Do you have custom instructions, custom skills, or use any kind of framework?”
Green flags
- Can describe their setup concretely: a CLAUDE.md or Cursor rules file with specific conventions.
- Custom agent skills for recurring tasks, a structured system prompt they maintain and iterate on.
- Uses a framework like GSD, Claude.md conventions, or a home-grown approach.
- Has opinions on what goes in context vs. what doesn't.
- Treats context engineering as a craft.
Red flags
- Doesn't know what custom instructions are.
- Uses the tool with default settings only.
- Has never written a system prompt or rules file.
- Thinks prompting is just “telling the AI what you want” rather than a structured discipline.
- Can't explain what a CLAUDE.md or .cursorrules file does.
Q8: “Walk me through how you start a new feature with AI.”
Green flags
- Has a repeatable process.
- Starts with spec clarity — writing a precise problem definition before touching the AI.
- Describes how they load relevant context (which files, architecture docs, existing patterns).
- Breaks the feature into smaller tasks for individual agent runs.
- Runs verification at each step rather than at the end.
- Can describe how the process differs for greenfield vs. editing existing code.
Red flags
- Describes an ad-hoc process: “I just describe what I want and see what comes out.”
- No mention of spec clarity before prompting.
- Treats the entire feature as a single prompt rather than decomposing it.
- Can't describe what they do differently from how they wrote code without AI.
- Describes the AI as doing “most of the thinking” — agentic engineers do the thinking.
Q9: “Has your team taken any initiatives to restructure how you work for AI-assisted development?”
Green flags
- Has concrete examples: changed how tickets are written, moved from Scrum to Kanban, eliminated story points, introduced AI-PR scoring, created shared context files for the team.
- Can distinguish between what they changed personally vs. what the team changed.
- Has opinions on what's working and what isn't.
- Even if their current team hasn't changed — they have opinions on what should change.
Red flags
- Says “no, we just use AI for our individual work.”
- No awareness of how AI changes team dynamics, not just individual productivity.
- Hasn't thought about process beyond their personal workflow.
- Thinks the only question is “which tool?” rather than “how do we work differently?”
Hiring agentic engineers?
These engineers are already on our board. Post a role to reach them.
Tier 3 — Advanced (Technical Interviewer)
These questions are for technical interviewers who can evaluate the depth of the answers. They probe frontier-level thinking about agent architecture and context management.
Q3: “Do you know what an 'agent harness' is?”
Green flags
- Can define it without prompting: the constraints, CI gates, context files, and verification infrastructure that make agent output reliable — not the agent itself, but the environment around it.
- Has built one or contributed to one.
- Can speak to the difference between the agent harness (making individual agents effective) and the human harness (making the team effective with agents).
- Has opinions on what makes a good harness.
Red flags
- Doesn't know the term.
- Confuses the harness with the agent itself.
- Thinks “harness” just means prompt engineering.
- Has never thought about the infrastructure that makes AI output reliable at scale — only about what the AI does in isolation.
Q4: “Do you use sub-agents? How?”
Green flags
- Uses sub-agents (in Claude Code, Cursor 2.5 cloud agents, or similar) and can explain the task decomposition logic: what tasks get delegated vs. kept in the main context.
- Describes how they coordinate output between agents and handle failures in sub-agent runs.
- Has a mental model for parallelization — running multiple agents on independent workstreams simultaneously.
- Has an opinion on when sub-agents are worth the overhead.
Red flags
- Doesn't know what sub-agents are.
- Has only ever used a single-agent workflow.
- Thinks “sub-agents” just means asking the AI follow-up questions.
- Can't describe the coordination problem — what happens when agents work in parallel and produce conflicting outputs.
Q6: “How do you handle context window limitations?”
Green flags
- Has concrete strategies: decomposes large tasks into smaller agent runs that fit in context.
- Uses structured context files to front-load the most important information.
- Knows when to start a fresh context rather than trying to cram everything in.
- Knows what information belongs in the harness (persistent) vs. in the prompt (per-task).
- Has experience debugging hallucinations caused by context drift — where the model “forgets” earlier parts of a long conversation.
Red flags
- Doesn't know what a context window is.
- Treats it as a fundamental blocker rather than a manageable constraint.
- Has never designed around it — just hits the limit and starts over.
- Thinks bigger context windows make the problem go away (they don't — they just move the bottleneck).
What Level Should You Expect?
Use this table to calibrate expectations before the interview. Frontier candidates are rare. Don't hire an Advanced person for a role that needs Frontier — and don't pass on an Intermediate because you were hoping for Advanced.
| Role | Minimum Level | Ideal Level | Key Signal |
|---|---|---|---|
| Junior | Beginner | Intermediate | Learning fast, asks good questions about AI tools |
| Mid | Intermediate | Advanced | Consistent AI workflow, some custom tooling |
| Senior / Lead | Advanced | Frontier | Verification strategy, harness thinking |
| Staff+ | Frontier | Frontier | Team process changes, opinions on restructuring |
Beyond the Questions: Practical Assessments
These three formats complement the interview questions and reveal things that verbal answers can't.
Live debugging session (45 min)
Provide a small, intentionally broken codebase. Watch how they prompt, verify, and handle AI hallucination — not the fix itself.
Portfolio walkthrough (30 min)
Ask the candidate to screenshare a recent project and walk through one feature from prompt to production. “Where did the AI get it wrong? What context did you provide?”
Agent design sketch (20 min)
Present a simple product requirement, ask them to decompose it into agents or sub-agents. No code — just evaluating their mental model.
Skip LeetCode. It evaluates skills that agents now handle.
Red Flag Quick Reference
Dealbreakers
- Claims to use AI tools but cannot name specific tools, workflows, or failure modes
- Cannot describe any verification process for AI-generated code
- Philosophical resistance to AI-generated code in a role that explicitly requires it
Serious Concerns
- Only uses AI for autocomplete or boilerplate
- No awareness of context window limits or prompt degradation
- Has never encountered a case where AI-generated code failed
Minor / Contextual
- Unfamiliarity with specific agent frameworks by name (they change fast)
- Hasn't restructured a team around AI (may not have had the opportunity)
- Uses a different toolchain than your company uses (tool transfer is easy, mindset transfer isn't)
A Note on Fairness
Agentic engineering has only been real since late 2025. Nobody has even a year of experience. Questions about tool adoption can correlate with factors beyond skill — company policy, team culture, access. Focus on reasoning quality and learning trajectory, not specific tool names. Ask “even if you haven't had the authority to restructure a team, how would you approach it?” to open questions to candidates at any career stage.
For deeper context on what agentic engineering actually is — beyond tools and into the mental model shift — read What is Agentic Engineering? It covers the Bottleneck Flip, the two harnesses, and evidence from production teams doing this at scale.
Share this guide
Know a hiring manager trying to evaluate AI-native engineers? Send them this.
One More Thing Before You Interview
The best candidates you'll meet will have strong opinions, real war stories, and probably a few experiments that failed badly and taught them something useful. Your job isn't to find someone with a perfect answer to every question. It's to find someone who's clearly been in the arena — someone who talks about agents with the specific frustration and appreciation of someone who's actually worked with them.
If you find that person, hire them. And if you need to find them first — that's what we built this board for. Post an agentic engineering role — it takes five minutes and reaches engineers who filter by the AI tools your team uses.
Browse current listings →