9 Questions That Reveal Whether a Developer Actually Does AI-Native Engineering

Q: What AI coding tool do you use day-to-day?

Strong candidates name a specific tool (Cursor, Claude Code, Copilot, Windsurf) and explain why they chose it — ideally with a comparison to alternatives. They describe how they configure it: custom instructions, CLAUDE.md, rules files, memory. Red flags include only naming ChatGPT or Copilot without configuration knowledge, or giving vague answers like "I use AI tools" without specifics.

Q: How do you verify AI-generated code?

Advanced candidates have a systematic approach: they describe their quality gates (type checks, tests, code review with specific intent), explain how they catch hallucinations (testing edge cases, checking external references), and discuss what they do when AI code looks right but feels wrong. Red flags include "I just read it over" or relying entirely on the AI to self-verify.

Q: How do you handle context window limitations?

Experienced agentic AI engineers have concrete strategies: they decompose large tasks into smaller agent runs, use structured context files (CLAUDE.md, architecture docs) to front-load relevant information, and know when to start a fresh context rather than trying to cram everything in. Red flags include not knowing what a context window is, or treating it as a fundamental blocker rather than a manageable constraint.

A recruiter asked me how to interview AI-native engineers. I wrote down everything I know.

Maxim Buz

March 2026 · 8 min read · Last updated Mar 23, 2026

A recruiter friend asked me for 3 questions to evaluate developers on AI skills. I gave him 9. His response: “this is worth gold.” Here's the full guide.

A practical scoring guide for hiring managers. No technical background required. I've been building software exclusively with AI agents since January 2026 — I haven't written a line of code manually since then. These questions are what I'd ask if I were on the other side of the table.

Only have 5 minutes? Ask these 3 questions

What AI coding tool do you use day-to-day? — Tests tool awareness and configuration depth.

What percentage of your code do you write manually vs. generate? — The most honest signal of actual adoption.

Walk me through how you start a new feature with AI. — Reveals workflow maturity and process thinking.

Scoring Framework

Use this to calibrate responses before you start. Most candidates will land at Intermediate. Frontier is rare and worth paying for.

Level	Description	Typical Signals
Beginner	Uses AI as autocomplete	Copilot only, writes most code manually, no structured workflow
Intermediate	Active AI user	Cursor/Claude Code, ~50% generated, some prompt engineering
Advanced	AI-native workflow	80%+ generated, custom instructions, verification strategy, sub-agents
Frontier	Builds the harness	Agent harness engineering, framework usage, team process changes, context management

Tier 1 — Basics (Any Interviewer Can Ask)

These questions require no technical background. A non-technical recruiter or HR manager can ask them and evaluate the responses with the flags below.

Q1: “What AI coding tool do you use day-to-day?”

✅ Green flags

Names a specific tool (Cursor, Claude Code, Windsurf, Copilot) and explains why they chose it — ideally with a comparison to alternatives they've tried.
Describes how they've configured it: custom instructions, CLAUDE.md or rules files, memory features.
Mentions using multiple tools for different tasks.
Can speak to what the tool is good and bad at.

🚩 Red flags

Says “ChatGPT” as their primary coding tool — signals copy-paste workflow, not AI-native development.
Gives vague answers like “I use AI tools” without naming anything specific.
Has never configured the tool beyond default settings.
Names Copilot but describes using it only for tab completion.

Q2: “What percentage of your code do you write manually vs. generate?”

✅ Green flags

Advanced candidates say 70–95% generated without hesitation.
Can explain what the remaining percentage is (boilerplate they still type, config files, things where AI reliably hallucinates).
Distinguishes between different types of work: business logic vs. scaffolding vs. tests.
Can explain how the ratio has changed over time and why.

🚩 Red flags

Says less than 30% generated — Beginner/Intermediate signal at best.
Gives a number without being able to explain the breakdown.
Says “it depends” and can't give a ballpark.
Seems uncomfortable with the question — AI-native engineers wear this number as a badge.

Q7: “How do you verify AI-generated code? What's your quality gate?”

✅ Green flags

Has a systematic approach — not just “I read it.”
Describes specific gates: type checker passes, tests cover the behavior, they test edge cases the AI wouldn't think to test.
Explains how they catch AI hallucinations (checking external API references, testing boundary conditions).
Has a mental model for “this output looks right but feels wrong.”
Uses CI as a safety net, not a primary gate.

🚩 Red flags

Says “I just review the diff” without describing how they review it.
Relies entirely on the AI to self-verify or self-correct.
No mention of tests.
Can't describe a time AI code looked correct but wasn't.
Treats “it compiled” as a quality gate.

Tier 2 — Intermediate (Tech-Aware Interviewer)

These questions assume the interviewer has basic technical context. They probe workflow depth and team-level thinking — the difference between a good individual practitioner and someone who can uplift a team.

Q5: “Do you have custom instructions, custom skills, or use any kind of framework?”

✅ Green flags

Can describe their setup concretely: a CLAUDE.md or Cursor rules file with specific conventions.
Custom agent skills for recurring tasks, a structured system prompt they maintain and iterate on.
Uses a framework like GSD, Claude.md conventions, or a home-grown approach.
Has opinions on what goes in context vs. what doesn't.
Treats context engineering as a craft.

🚩 Red flags

Doesn't know what custom instructions are.
Uses the tool with default settings only.
Has never written a system prompt or rules file.
Thinks prompting is just “telling the AI what you want” rather than a structured discipline.
Can't explain what a CLAUDE.md or .cursorrules file does.

Q8: “Walk me through how you start a new feature with AI.”

✅ Green flags

Has a repeatable process.
Starts with spec clarity — writing a precise problem definition before touching the AI.
Describes how they load relevant context (which files, architecture docs, existing patterns).
Breaks the feature into smaller tasks for individual agent runs.
Runs verification at each step rather than at the end.
Can describe how the process differs for greenfield vs. editing existing code.

🚩 Red flags

Describes an ad-hoc process: “I just describe what I want and see what comes out.”
No mention of spec clarity before prompting.
Treats the entire feature as a single prompt rather than decomposing it.
Can't describe what they do differently from how they wrote code without AI.
Describes the AI as doing “most of the thinking” — AI-native engineers do the thinking.

Q9: “Has your team taken any initiatives to restructure how you work for AI-assisted development?”

✅ Green flags

Has concrete examples: changed how tickets are written, moved from Scrum to Kanban, eliminated story points, introduced AI-PR scoring, created shared context files for the team.
Can distinguish between what they changed personally vs. what the team changed.
Has opinions on what's working and what isn't.
Even if their current team hasn't changed — they have opinions on what should change.

🚩 Red flags

Says “no, we just use AI for our individual work.”
No awareness of how AI changes team dynamics, not just individual productivity.
Hasn't thought about process beyond their personal workflow.
Thinks the only question is “which tool?” rather than “how do we work differently?”

Hiring AI-native engineers?

These engineers are already on our board. Post a role to reach them.

Post a Role Browse current listings →

Tier 3 — Advanced (Technical Interviewer)

These questions are for technical interviewers who can evaluate the depth of the answers. They probe frontier-level thinking about agent architecture and context management.

Q3: “Do you know what an 'agent harness' is?”

✅ Green flags

Can define it without prompting: the constraints, CI gates, context files, and verification infrastructure that make agent output reliable — not the agent itself, but the environment around it.
Has built one or contributed to one.
Can speak to the difference between the agent harness (making individual agents effective) and the human harness (making the team effective with agents).
Has opinions on what makes a good harness.

🚩 Red flags

Doesn't know the term.
Confuses the harness with the agent itself.
Thinks “harness” just means prompt engineering.
Has never thought about the infrastructure that makes AI output reliable at scale — only about what the AI does in isolation.

Q4: “Do you use sub-agents? How?”

✅ Green flags

Uses sub-agents (in Claude Code, Cursor 2.5 cloud agents, or similar) and can explain the task decomposition logic: what tasks get delegated vs. kept in the main context.
Describes how they coordinate output between agents and handle failures in sub-agent runs.
Has a mental model for parallelization — running multiple agents on independent workstreams simultaneously.
Has an opinion on when sub-agents are worth the overhead.

🚩 Red flags

Doesn't know what sub-agents are.
Has only ever used a single-agent workflow.
Thinks “sub-agents” just means asking the AI follow-up questions.
Can't describe the coordination problem — what happens when agents work in parallel and produce conflicting outputs.

Q6: “How do you handle context window limitations?”

✅ Green flags

Has concrete strategies: decomposes large tasks into smaller agent runs that fit in context.
Uses structured context files to front-load the most important information.
Knows when to start a fresh context rather than trying to cram everything in.
Knows what information belongs in the harness (persistent) vs. in the prompt (per-task).
Has experience debugging hallucinations caused by context drift — where the model “forgets” earlier parts of a long conversation.

🚩 Red flags

Doesn't know what a context window is.
Treats it as a fundamental blocker rather than a manageable constraint.
Has never designed around it — just hits the limit and starts over.
Thinks bigger context windows make the problem go away (they don't — they just move the bottleneck).

What Level Should You Expect?

Use this table to calibrate expectations before the interview. Frontier candidates are rare. Don't hire an Advanced person for a role that needs Frontier — and don't pass on an Intermediate because you were hoping for Advanced.

Role	Minimum Level	Ideal Level	Key Signal
Junior	Beginner	Intermediate	Learning fast, asks good questions about AI tools
Mid	Intermediate	Advanced	Consistent AI workflow, some custom tooling
Senior / Lead	Advanced	Frontier	Verification strategy, harness thinking
Staff+	Frontier	Frontier	Team process changes, opinions on restructuring

Beyond the Questions: Practical Assessments

These three formats complement the interview questions and reveal things that verbal answers can't.

Live debugging session (45 min)

Provide a small, intentionally broken codebase. Watch how they prompt, verify, and handle AI hallucination — not the fix itself.

Portfolio walkthrough (30 min)

Ask the candidate to screenshare a recent project and walk through one feature from prompt to production. “Where did the AI get it wrong? What context did you provide?”

Agent design sketch (20 min)

Present a simple product requirement, ask them to decompose it into agents or sub-agents. No code — just evaluating their mental model.

Skip LeetCode. It evaluates skills that agents now handle.

Red Flag Quick Reference

Dealbreakers

Claims to use AI tools but cannot name specific tools, workflows, or failure modes
Cannot describe any verification process for AI-generated code
Philosophical resistance to AI-generated code in a role that explicitly requires it

Serious Concerns

Only uses AI for autocomplete or boilerplate
No awareness of context window limits or prompt degradation
Has never encountered a case where AI-generated code failed

Minor / Contextual

Unfamiliarity with specific agent frameworks by name (they change fast)
Hasn't restructured a team around AI (may not have had the opportunity)
Uses a different toolchain than your company uses (tool transfer is easy, mindset transfer isn't)

A Note on Fairness

AI-native engineering has only been real since late 2025. Nobody has even a year of experience. Questions about tool adoption can correlate with factors beyond skill — company policy, team culture, access. Focus on reasoning quality and learning trajectory, not specific tool names. Ask “even if you haven't had the authority to restructure a team, how would you approach it?” to open questions to candidates at any career stage.

For deeper context on what AI-native engineering actually is — beyond tools and into the mental model shift — read What is AI-Native Engineering? It covers the Bottleneck Flip, the two harnesses, and evidence from production teams doing this at scale.

Share this guide

Know a hiring manager trying to evaluate AI-native engineers? Send them this.

“If you're hiring engineers in 2026, you need a different interview framework. Traditional coding tests don't reveal how well someone actually works with AI. This guide has 9 questions with specific green/red flags — changed how I evaluate candidates.”

One More Thing Before You Interview

The best candidates you'll meet will have strong opinions, real war stories, and probably a few experiments that failed badly and taught them something useful. Your job isn't to find someone with a perfect answer to every question. It's to find someone who's clearly been in the arena — someone who talks about agents with the specific frustration and appreciation of someone who's actually worked with them.

If you find that person, hire them. And if you need to find them first — that's what we built this board for. Post an AI-native engineering role — it takes five minutes and reaches engineers who filter by the AI tools your team uses.
Browse current listings →