Lead Software Engineer II, AI Operations

Best Egg2026-04-02

$150K - $170K/yr

AI Infrastructure

AWS Bedrock Databricks MCP OpenAI API

Tech Stack

AWS Datadog Metaflow Outerbounds Python

About the Role

Best Egg is a market-leading, tech-enabled financial platform helping people build financial confidence through a variety of installment lending solutions and financial health tools.

Best Egg is hiring a Lead Software Engineer II for AI Operations to design, ship, and operate production-grade LLM applications, agents, and automations across the business. You'll own the end-to-end path from prototype to stable deployment — building RAG pipelines, instituting evals and guardrails, and driving cost/performance optimization. Our stack includes Python, Metaflow on Outerbounds, AWS (including Bedrock), OpenAI/ChatGPT, and Cursor; Databricks is being evaluated and available where it makes sense. Your work will accelerate delivery, reduce LLM unit costs, and improve output quality for use cases like agent assist, compliance automation, process automation, and QA — treating AI Ops as a force multiplier for the enterprise.

Key Responsibilities:

Build and ship LLM apps & agents: Deliver internal copilots and customer/agent-facing automations with clear SLAs, rollbacks, and observability from day one.
Own RAG pipelines: Design ingestion, chunking, embeddings, indexing, hybrid search/rerank, and retrieval evaluation; track retriever quality via offline golden sets and online metrics.
AWS Infrastructure & Orchestration: Design and implement scalable AWS architectures, including AWS AI features such as Bedrock, IAM, knowledge bases, secure secrets and policy enforcement.
Observability & SRE for AI: Add tracing, prompt/agent version lineage, eval dashboards, and regression alerts; establish golden datasets and canary tests.
Guardrails & governance: Enforce PII redaction, safety filters, role-based access, audit logs, and human-in-the-loop review paths.
CI/CD for AI artifacts: Version and deploy prompts, tools, agents, and retrieval pipelines; support blue/green and shadow deploys.
Cost & performance: Cut run-rate spend through caching, truncation, batching, autoscaling, and model routing.
Developer enablement: Provide templates, SDKs, and high-quality abstractions that let product teams ship safely.

What You'll Need to Succeed:

Experience: 5-10 years of professional software engineering with 2+ years building AI/LLM applications; portfolio of shipped AI projects.
LLM product engineering: Hands-on with OpenAI, Bedrock, Huggingface/Ollama/vLLM; MCP servers and function/tool calling, multi-turn orchestration, streaming, and prompt/version management.
RAG expertise: Practical experience designing and tuning retrieval systems (chunking, embeddings, hybrid search, reranking).
Full-stack or equivalent backend depth: Comfortable building APIs/services; strong fundamentals in Python.
Platform & orchestration: Metaflow (Outerbounds) preferred; Databricks familiarity is a plus.
Observability & testing for AI: Tracing, logging, expertise in tools like Datadog, Dynatrace or Grafana.

Salary: $150,000 - $170,000 annually, plus 20% incentive bonus target.

Candidates should include links to their portfolio (GitHub, write-ups, or demos) with applications.

More jobs like this

Senior AI/ML Engineer

Natera

$126K - $157K/yrRemoteCrewAILangChain+4 more

AI Data Platform Lead

Agiloft

RemoteLangGraphMastra

Lead Platform Engineer

BioLM

Lead Platform Engineer

BioLM

Sr. / Lead Forward Deployed Engineer (AI)

Alpha Financial Markets Consulting

$150K - $210K/yrRemoteDSPyLangChain+1 more

Influur

$150K - $200K/yrRemoteCrewAILangChain+3 more

Axle

$120K - $150K/yrRemoteMCP

Applied AI Engineer

Splitero

$135K - $175K/yrRemoteLangChainLlamaIndex+2 more

Senior Staff Machine Learning Engineer, AI Agent Platform

GEICO

$130K - $300K/yrRemoteAutoGenCrewAI+7 more

Staff Machine Learning Engineer, AI Agent Platform

GEICO

$115K - $260K/yrRemoteAutoGenCrewAI+8 more

Software Engineer - Full Stack AI (US-Remote)

PerfectServe

$130K - $170K/yrRemoteAnthropicAWS Bedrock+1 more

Senior Software Engineer II - Applied AI (Remote Eligible)

Smartsheet

$193K - $245K/yrRemoteAgnoPinecone+1 more

Staff AI Engineer - Grafana Ops, AI/ML | USA | Remote

Grafana Labs

$175K - $220K/yrRemote

Staff Software Engineer - Data AI Agent

Aledade

RemoteDatabricksMCP

Staff Software Engineer, Agentic Patient Outreach

Aledade

Senior Engineer - Artificial Intelligence

Tucows Domains

$120K - $145K/yrRemote

🇨🇦 🇺🇸

Lead Agentic AI Engineer

OLX

RemoteLangChainLangGraph+3 more

Forward Deployed Engineer

Origin by Prelude

$200K - $320K/yrRemote

🇨🇦 🇺🇸

Senior Machine Learning Engineer - Personalization, Horizon

Spotify

$184K - $263K/yrRemote

Lead Agentic AI Engineer

OLX

RemoteLangChainLangGraph+3 more

View all similar jobs →

Explore related roles

More AWS Bedrock jobs More Databricks jobs More MCP jobs More OpenAI API jobs Remote agentic jobs Lead agentic jobs

Get jobs like this weekly