Senior or Staff ML Systems Engineer, LLMs

TRM Labs
Full-timeStaff
$200K - $275K/yr

AI Tools

Agentic SystemsLangChainLangfuseLlamaIndexvLLM

Tech Stack

PythonLangChainLlamaIndexvLLMMLflowDockerKubernetesTerraformDatadog

Agent Workflow

Build modular AI infrastructure for deploying LLMs and agentic systems at scale. Integrate AI models and agents into real-time production applications with evaluation infrastructure.

About the Role

TRM Labs' AI Engineering team focuses on LLMs and agentic systems, building robust pipelines and infrastructure for deploying AI systems at scale.

Key Responsibilities:

  • Develop CI/CD workflows for model training, evaluation, and deployment using tools like Langfuse and GitHub Actions
  • Automate model versioning, approval workflows, and compliance checks
  • Build modular AI infrastructure including vector databases, feature stores, and model registries
  • Integrate AI models and agents into real-time production applications
  • Deploy evaluation infrastructure for LLMs and agentic systems with regression testing and cost monitoring
  • Enable researcher productivity through sandboxes and reproducible environments

Required Qualifications:

  • High-quality Python software development
  • Scalable infrastructure experience (Docker, Kubernetes, Terraform, CI/CD)
  • Monitoring/logging expertise (Datadog, Prometheus, OpenTelemetry)
  • MLOps best practices including model versioning and drift detection
  • Production LLM/agentic workflow deployment and optimization
  • Strong ownership mentality

Tech Stack: LangChain, LlamaIndex, vLLM, MLflow, BentoML, Langfuse, GitHub Actions, Docker, Kubernetes, Terraform, Datadog, Prometheus, OpenTelemetry, Triton.

Apply Now
Apply Now

Similar Jobs

Get jobs like this weekly