AI Engineer for LLM Ops & Evaluation (m/f/d)

Auxilius.ai2026-05-1215 views

Full-time Senior

AI Infrastructure

Anthropic LangSmith Langfuse OpenAI API

Agentic Frameworks

DSPy

Tech Stack

Azure Java Kubernetes Python spring-boot

About the Role

About Auxilius.ai

Auxilius.ai is an early-stage AI startup focused on Governance, Risk and Compliance (GRC) solutions, serving enterprise customers including auditors and compliance teams. We have product-market fit and need an AI Engineer to own our LLM operations pipeline end-to-end.

About the Role

This is a production-focused position owning the complete LLMOps pipeline at an early-stage AI-native startup.

Responsibilities

Manage end-to-end LLMOps infrastructure, prompt optimization, and production integration
Design evaluation strategy (deterministic vs. LLM-judge tradeoffs)
Drive prompt optimization across our LLM pipelines
Establish observability, monitoring, and human-in-the-loop workflows with review queues and feedback loops
Manage cost/latency tradeoffs in production
Mentor an AI & Analytics intern

Core Requirements

3+ years shipping production ML/AI systems
Experience building a shipped LLM evaluation or prompt optimization pipeline
Strong hands-on experience with LLM-as-judge, including its variance problems and techniques to control them
Classical NLP/ML ops foundation (embeddings, semantic similarity, entity matching, classification)
Production judgment on cost/latency tradeoffs and observability
Strong Python; excellent English communication

Nice-to-Have

Observability tools (Langfuse, LangSmith, Phoenix/Arize, Helicone, Braintrust, W&B)
Experience with DSPy or similar prompt optimization frameworks
Azure OpenAI or EU-sovereign LLM providers (Mistral, Aleph Alpha)
Guardrails/content safety/AI governance exposure
Enterprise software experience
Java/Spring Boot, Kubernetes
German language
GRC domain knowledge

Tech Stack
Python, OpenAI, Anthropic, embeddings, semantic similarity, entity matching, classification. Backend uses Java, Spring Boot, Angular, Kubernetes on Azure.

AI Engineer for LLM Ops & Evaluation (m/f/d)

AI Infrastructure

Agentic Frameworks

Tech Stack

About the Role

More jobs like this

Explore related roles

Get jobs like this weekly