- Jobs
- Reltio
- Senior AI SDET
Senior AI SDET
Agentic Frameworks
About the Role
Senior AI SDET (AI Engineering & Quality)
Location: Bengaluru, India - Hybrid
Role Overview
We are looking for a powerhouse engineer who bridges AI development and automated validation. This is a builder-tester role: you will own the full lifecycle of AI features, from writing core Python logic for agentic workflows and tuning prompts, to architecting robust evaluation frameworks that ensure those systems are enterprise-ready.
Core Responsibilities
AI Development & Engineering
- Agentic Implementation: Design and implement autonomous agents using frameworks such as LangGraph, CrewAI, or AutoGen.
- Prompt Engineering: Own the prompt lifecycle, design, version, and tune system prompts to minimize hallucinations and maximize intent recognition.
- RAG Pipeline Development: Build and optimize Retrieval-Augmented Generation (RAG) components, including document ingestion, chunking strategies, and vector database indexing.
- Feature Prototyping: Rapidly prototype AI-driven features in Python to validate feasibility before full-scale integration.
- Data Curation: Build golden datasets and synthetic data generation scripts to train and evaluate models.
Advanced Validation & Quality Architecture
- Automated Evaluation (Eval-as-Code): Build automated pipelines to measure LLM performance across metrics such as faithfulness, relevancy, and toxicity.
- Non-Deterministic Testing: Develop strategies to test 'fuzzy' outputs using LLM-assisted evaluation (using one LLM to grade another).
- Hybrid Framework Development: Design and maintain a dual-stack automation framework covering both backend/infrastructure and AI/ML validation.
- End-to-End Orchestration: Integrate AI tests into the MLOps pipeline so that every model deployment or prompt change triggers a full regression of the agent's reasoning capabilities.
- Mentorship & Standards: Serve as the team's subject matter expert, defining standards for writing inherently testable AI code.
Required Skills
- 5+ years as SDET/QA Automation/AI Engineer
- 2+ years testing AI/ML products
- Production-quality Python code and SaaS testing background
- LLMs: OpenAI, Anthropic, Mistral, Llama
- Vector Databases: Pinecone, Milvus, Weaviate
- Data Science Libraries: Pandas, NumPy
- Performance Testing: JMeter, Locust
- Message Queues: Kafka, RabbitMQ