Staff Infrastructure Software Engineer, Enterprise AI

Scale AI
Full-timeStaff
$216K - $311K/yr

Tech Stack

PythonTypeScriptKubernetesTerraformAWSGCPAzure

Agent Workflow

Define deployment standards for agentic workflows at scale. Architect multi-cloud infrastructure for orchestrating and evaluating multi-agent systems. Build agentic observability platform (logging, metrics, tracing) and knowledge retrieval/inference infrastructure.

About the Role

Scale GP is building the infrastructure that makes enterprise AI seamless. We are looking for a Senior or Staff Infrastructure Engineer to act as a primary technical lead, engineering the 'paved road' for our knowledge retrieval and inference engines. You won't just be managing resources; you'll be defining the deployment standards for Agentic workflows at scale. Your mission is to bridge the gap between complex AI orchestration and world-class infrastructure, ensuring our platform remains the most reliable destination for enterprise agents.

The ideal candidate thrives in a fast-paced environment, has a passion for both deep technical work and mentoring, and is capable of setting a long-term technical strategy for a critical domain while maintaining a strong, hands-on delivery focus. You will architect and implement solutions across multiple cloud providers (GCP, Azure, AWS) for customers in diverse, highly-regulated industries like healthcare, telecom, finance, and retail.

What You'll Do

  • Architect multi-cloud systems and abstractions to allow the SGP platform to run on top of existing Cloud providers.
  • Use our own data and AI platform to analyze build and test logs and metrics to identify areas for improvement.
  • Define the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers.
  • Enhance engineering and infrastructure efficiency, reliability, accuracy, and response times, including CI/CD processes, test frameworks, data quality assurance, end-to-end reconciliation, and anomaly detection.
  • Collaborate with platform and product teams to develop and implement innovative infrastructure that scales to meet evolving needs.
  • Design and champion highly scalable, reliable, and low-latency infrastructure and frameworks for building, orchestrating, and evaluating multi-agent systems at enterprise scale.
  • Lead the infrastructure roadmap with a strong focus on compliance, privacy, and security standards.
  • Own the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics).
  • Drive developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms.

What We're Looking For

  • Proven experience in a senior role, with 5+ years of full-time software engineering experience.
  • Deep understanding of modern infrastructure practices, including CI/CD, IaC (e.g., Terraform, Helm Charts), container orchestration (e.g., Kubernetes) and observability platforms (e.g., Datadog, Prometheus, Grafana).
  • Extensive experience with at least one major cloud provider (AWS, Azure, or GCP).
  • Strong knowledge of security and compliance in enterprise environments.
  • Proficiency in Python or JavaScript/TypeScript, and SQL.
  • Bonus: Hands-on experience with Agents, LLMs, vector databases, and emerging AI technologies.

Base salary range for San Francisco, New York, Seattle: $216,200 - $310,500 USD

Apply Now
Apply Now

More jobs like this

Explore related roles

Get jobs like this weekly