We are looking for a skilled Senior SRE Engineer to join our distributed team. In this role, you will be responsible for maintaining and improving the reliability, resiliency, scalability, and performance of our onboarded systems. You will provide on-call support, manage production incidents, and drive continuous improvements to our systems and processes while collaborating closely with engineering and operations teams.
Client:
Our client is a large online retailer with yearly revenue of £1 billion.
Project Overview:

Responsibilities:

Maintain and enhance reliability, resiliency, scalability, and performance of onboarded systems.
Provide on-call support, diagnose, mitigate, fix, and escalate production incidents in a timely manner.
Lead incident follow-ups, root cause analysis, and preventive actions to minimize recurrence.
Implement customer-centric approaches to align system reliability with user experience.
Ensure systems have appropriate SLIs, monitoring, and alerting to meet agreed SLOs.
Identify critical system components requiring enhanced availability in partnership with engineering and operations.
Design and roll out strategies, tooling, and processes to improve system stability and performance.
Develop and maintain CI/CD pipelines for seamless deployment and releases.
Automate repetitive and manual tasks to reduce toil and increase operational efficiency.
Participate in system architecture discussions focused on reliability and reducing maintenance complexity.

Nice To Have:

Experience with concurrency in Java
Python knowledge
Dependency conflict resolution experience
Terraform knowledge
Experience with CloudFormation
Knowledge of GCP
Knowledge of BigQuery
Understanding of core SRE concepts (SLI/SLO/etc)
Knowledge of reliability patterns (Circuit breaker, Retry, etc.)

Required Qualifications:

Java Senior/Expert level with strong background in Spring Boot
Experience with shell scripts
Working experience with Docker including creation/modification of Docker images
Maven and Gradle experience
Understanding of AWS ECS
Experience working with core AWS Services (SNS, SQS, Kinesis, RDS, DynamoDB, S3, Elasticache)
Experience with GitLab
Experience troubleshooting/bugfixing in distributed cloud environments
Experience with OpenSearch/Kibana
Understanding of metrics and tracing
Knowledge of Prometheus and Grafana
Readiness to be part of a 24/7 rota

Note:

Our intelligent job search engine discovered this job and republished it for your convenience.
Please be aware that the job information may be incorrect or incomplete. The job announcement remains the property of its original publisher. To view the original job and its full details, please visit the job's URL on the owner’s page.

Please clearly mention that you have heard of this job opportunity on https://ijob.am.

iJob.am iJob.am

Senior SRE Engineer

Full Time
Armenia
Posted 3 weeks ago
DataArt

Overview

Ողջու՛յն, Մեր նոր Telegram IT ալիքում կհրապարակվեն միայն ՏՏ ոլորտի հայտերը։
Միանալու համար սեղմեք նկարի վրա։

Senior SRE Engineer

Full TimeArmeniaPosted 3 weeks ago DataArt

Overview

Related Jobs

Tech Ops Specialist

Senior Java Developer

Senior Site Reliability Engineer

Full Time
Armenia
Posted 3 weeks ago
DataArt