At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.

We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments. Our offerings include le Chat, the AI assistant for life and work.

We are a dynamic, collaborative team passionate about AI and its potential to transform society. Our diverse workforce thrives in competitive environments and is committed to driving innovation. Our teams are distributed between France, USA, UK, Germany and Singapore.

Role Summary:

Embedded directly in a product team as search, chat, documents, or audio, you'll improve AI-powered features through rigorous evaluation, prompt and orchestration design, and rapid experimentation. You'll own your domain's AI quality end-to-end: define what 'good' looks like, measure it, run experiments, and ship what works. Work with Science to deliver measurable improvements to quality, latency, safety, and reliability.

What you will do:

Design and run evaluations for your product area: reference tests, heuristics, model-graded checks tailored to search relevance, chat quality, document understanding, or audio performance.
Define and track metrics that matter: task success, helpfulness, hallucination proxies, safety flags, latency, cost.
Own prompt and orchestration design: write, test, and iterate on prompts and system prompts as a core part of your work.
Run A/B tests on prompts, models, and configurations; analyze results; make rollout or rollback decisions from data.
Set up observability for LLM calls: structured logging, tracing, dashboards, alerts.
Operate model releases: canary and shadow traffic, sign-offs, SLO-based rollback criteria, regression detection.
Improve core behaviors in your product area, whether that's memory policies, intent classification, routing, tool-call reliability, or retrieval quality.
Create templates and documentation so other teams can author evals and ship safely.
Partner with Science to diagnose regressions and lead post-mortems.

About you:

3-4 years of experience; backgrounds that fit well include ML engineers moving closer to product, or software engineers with real AI/ML production experience.
Strong TypeScript or Python skills — we have both tracks depending on team fit.
Production LLM experience: prompts, tool/function calling, system prompts.
Hands-on with evals and A/B testing; you can design metrics, not just run them.
Comfortable implementing directly in product code, not only notebooks.
Observability experience: logging, tracing, dashboards, alerting.
Product mindset: form hypotheses, run experiments, interpret results, ship.
Clear communication, autonomous, and oriented toward production impact over experimentation for its own sake.

Ideal additional experience:

Safety systems experience: moderation, PII handling/redaction, guardrails.
Release operations: canary/shadowing, automated rollbacks, experiment platforms.
Prior work on search ranking, chat systems, document AI, or audio ML features.

Location: Paris HQ (at least 3 days per week on-site)

What we offer:

Competitive salary and equity package, Health insurance, Transportation allowance, Sport allowance, Meal vouchers, Private pension plan, Generous parental leave policy.

This role is closed.See similar open roles

More jobs like this

Applied AI, Fullstack Software Engineer, Critical and Sovereign Institutions, Paris

Mistral AI

Mistral

🇫🇷

2026-05-06

AI Developer - LLM Applications

Reaktor

LangChainLlamaIndex+3 more

🇳🇱

2026-05-21

AI Software Engineer II - Amplify C.S

Klaviyo

Resilient Co

Business AI Engineer (Global)

Hitachi Energy

🇪🇸

2026-05-12

AI Engineer

Miro

LangChainAnthropic+5 more

🇩🇰 🇬🇧

2026-05-06

AI Developer - LLM Applications

Reaktor

LangChainLlamaIndex+5 more

🇳🇱

2026-05-06

AI Engineer

spektr

LangChainAnthropic+3 more

🇩🇰

2026-05-06

AI Developer - LLM Applications

Reaktor

LangChainLlamaIndex+5 more

🇫🇮

2026-05-06

AI Engineer, Production Agents

Guild.ai

🇺🇸

2026-06-28

Ingénieur(e) IA confirmé(e)

ATEME

Claude Agent SDKCrewAI+6 more

🇫🇷

2026-06-13

Application/Agentic AI Engineer

NextGen Federal Systems

AutoGenCrewAI+3 more

🇺🇸

2026-06-13

Applied AI Engineer - Agentic Workflows

Cohere

LangChainLangGraph+2 more

🇸🇬

2026-06-06

Forward Deployed Software Engineer - US Government - Federal Health and Civilian

Palantir Technologies

$135K - $200K/yr

🇺🇸

2026-05-22

Forward Deployed Software Engineer - US Government - Federal Health and Civilian

Palantir Technologies

$135K - $200K/yr

🇺🇸

2026-05-22

Full-Stack Product Engineer

LlamaIndex

$150K - $230K/yrLlamaIndex

🇺🇸

2026-05-19

AI Engineer

Arbital Health

CrewAIGoogle ADK+4 more

🇺🇸

2026-05-08

Founding Engineer (Applied AI)

Faction

🇺🇸

2026-05-08

AI Agent Engineer

General Motors

LangChainSemantic Kernel

🇺🇸

2026-05-06

Software Engineer - AI Platforms and Products (Frontend)

The New York Times

$104K - $130K/yrAnthropicOpenAI API

🇺🇸

2026-05-06

View all similar jobs →

Explore related roles

Hybrid agentic jobs Mid-Level agentic jobs Jobs in France

Get jobs like this weekly

Join 26 subscribers