Senior or Staff ML Systems Engineer, LLMs

Expired

This listing is older than 60 days and may no longer be accepting applications.

TRM Labs
$200K - $275K/yr

AI Infrastructure

Agentic Frameworks

Tech Stack

About the Role

TRM Labs' AI Engineering team focuses on LLMs and agentic systems, building robust pipelines and infrastructure for deploying AI systems at scale.

Key Responsibilities:

  • Develop CI/CD workflows for model training, evaluation, and deployment using tools like Langfuse and GitHub Actions
  • Automate model versioning, approval workflows, and compliance checks
  • Build modular AI infrastructure including vector databases, feature stores, and model registries
  • Integrate AI models and agents into real-time production applications
  • Deploy evaluation infrastructure for LLMs and agentic systems with regression testing and cost monitoring
  • Enable researcher productivity through sandboxes and reproducible environments

Required Qualifications:

  • High-quality Python software development
  • Scalable infrastructure experience (Docker, Kubernetes, Terraform, CI/CD)
  • Monitoring/logging expertise (Datadog, Prometheus, OpenTelemetry)
  • MLOps best practices including model versioning and drift detection
  • Production LLM/agentic workflow deployment and optimization
  • Strong ownership mentality

Tech Stack: LangChain, LlamaIndex, vLLM, MLflow, BentoML, Langfuse, GitHub Actions, Docker, Kubernetes, Terraform, Datadog, Prometheus, OpenTelemetry, Triton.

See similar open roles

More jobs like this

Explore related roles

Get jobs like this weekly