Overview

The Staff Site Reliability Engineer (SRE) will play a critical role in building and scaling the infrastructure behind ServiceTitan’s new AI platform – an intelligent, always-on system that powers autonomous agents and real-time learning at scale.
You’ll own the reliability, performance, and deployment practices across multiple services and environments, driving innovation in automation, observability, and continuous delivery.
This role requires both technical depth and strategic thinking — someone who can architect solutions, mentor teams, and enable true operational excellence across engineering.

Responsibilities:
  • Lead the design, implementation, and optimization of scalable, resilient infrastructure for cloud-native AI services on Azure.
  • Establish true continuous delivery (CD) pipelines supporting blue-green deployments, automatic rollbacks, and progressive delivery patterns.
  • Champion observability excellence - define best practices for metrics, tracing, and logging; help product team design meaningful SLIs, SLOs, and error budgets.
  • Drive automation across the entire lifecycle: infrastructure provisioning, testing, deployment, and recovery.
  • Partner with the engineering team to design reliable, fault-tolerant services and perform resilience and capacity reviews.
  • Establish best practices for observability that not only monitor service health but also track the end-to-end success/failure of complex, automated agent workflows and their business impact (SLIs/SLOs).
  • Leverage Infrastructure as Code (IaC) using Terraform, Kubernetes, and Docker to standardize environments and reduce manual intervention.
  • Contribute to and maintain CI/CD pipelines using GitHub Actions, Azure DevOps, or TeamCity.
  • Implement and improve service health dashboards with Mimir, Grafana, Prometheus, or ELK stack to ensure system visibility and reliability.
  • Mentor engineers and foster a reliability culture across teams — enabling others to build self-healing, observable systems.
Required Qualifications:
  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field.
  • Solid experience in SRE, DevOps, or infrastructure engineering, with strong hands-on expertise in Azure.
  • Proven experience designing and operating distributed systems at scale with a strong understanding of reliability engineering principles (SLIs/SLOs/SLA).
  • Deep proficiency with Terraform, Kubernetes, Docker, and modern IaC and container orchestration best practices.
  • Expertise in CI/CD automation and release engineering - capable of implementing blue-green, canary, and rollback mechanisms.
  • Advanced use of observability tools such as Mimir, Grafana, Prometheus, and ELK stack.
  • Experience promoting GitOps workflows and tools such as Argo CD or Flux.
  • Excellent troubleshooting, systems thinking, and mentoring skills
Benefits:
  • Flextime, recognition, and support for autonomous work: Flexible time off with ample learning and development opportunities to continue growing your career. We offer a comprehensive onboarding program, leadership training for Titans at all levels, and other programs and events. Great work is rewarded through Bonusly, peer-nominated awards, and more.
  • Holistic health and wellness benefits: Company-paid medical, dental, and vision (available to employees and their dependents day 1), parent and siblings’ insurance, wellness benefit, office massage, etc.
  • Support for Titans at all stages of life: Parental leave and support, financial planning tools, Employee Assistance Program services, and more.
Nice To Have:
  • Knowledge of SQL Server and PostgreSQL performance tuning and management in cloud environments is a plus
Note:

✨ Our intelligent job search engine discovered this job and republished it for your convenience.
Please be aware that the job information may be incorrect or incomplete. The job announcement remains the property of its original publisher. To view the original job and its full details, please visit the job's URL on the owner’s page.

Please clearly mention that you have heard of this job opportunity on https://ijob.am.