Overview
OneMarketData is continuously searching for bright talent with the skills to make an impact. From developers to data scientists, at OneTick you will have the opportunity to develop and enhance your problem-solving skills using a combination of analytics, imagination, and talent.
Our DevOps team develops the infrastructure behind the hosted solutions and our software and data delivery lifecycle.
In the Cloud Project, we have a multi-account AWS infrastructure managed by the AWS organization. Separate AWS accounts are necessary to host customer-facing environments. We have been providing our customers with different setups for our application. In general, we use most of all common AWS resources like EC2, EKS, S3, VPC, ELB, etc, but also the stack of AWS resources is pretty comprehensive. Most of our AWS infrastructure is covered by IaC. CI/CD is running on GitLab.
We have more than 4 petabytes of data in S3 and EFS. We expose part of the data in S3 to the file system using Storage Gateways. Currently, we are migrating from setup on EC2 instances to Kubernetes, integrating centralized logging and monitoring solutions, migrating data loading processes to Airflow, and optimizing infrastructure costs planning to improve performance at the same time.
We are looking for an experienced Site Reliability Engineer (SRE) to join our team. Your primary responsibility will be to guarantee the reliability, scalability, and performance of our applications
and systems. Working closely with both our software engineers and product teams, you will dive deep into troubleshooting production issues, ensuring seamless operation. Additionally, you will collaborate on designing and implementing solutions to enhance our monitoring and alerting systems, aiming to optimize our overall efficiency and reliability. Your expertise in automation will play a crucial role in reducing manual toil and streamlining processes, ultimately contributing to the success of our operations.
- Monitor and maintain the health and reliability of our production systems
- Investigate and resolve production issues and outages
- Develop and maintain monitoring, alerting, and incident response systems
- Design and implement automation to reduce manual toil and improve system reliability
- Collaborate with software engineers to design and implement highly scalable and resilient systems
- Participate in on-call rotation and respond to incidents promptly
- Continuously improve our systems and processes to ensure the highest level of reliability and availability
- Document processes and procedures for maintaining and troubleshooting production systems
- Bachelor's degree in Computer Science, Engineering, or a related field
- 3+ years of experience as a Site Reliability Engineer or related role
- Strong knowledge of Linux/Unix systems and administration
- Proficiency in at least one programming language (e.g., Python, Java, C++)
- Experience with automation and configuration management tools (e.g., Ansible, Terraform)
- Experience with AWS and Kubernetes
- English - Upper-Intermediate or higher.
- Good communicative skills, being able to explain complicated things in simple words.
- Being eager to learn new technologies (including area-specific).
- Strong analytical and problem-solving skills
- Attentiveness, hard-working and goal-oriented mindset (to have the tasks done), and opportunity to work both in the team and independently.
- Be prepared to explore further and gain a comprehensive understanding of the product, ready to delve deeply into its functionality, because it is closely connected to how things work.
- As the main benefits, we have no bureaucracy, time tracking, and flexible hours. The main goal is to make employees feel comfortable and express themselves, maximizing their performance by liking what they are working on. All ideas can be realized, and many large companies will use all the work being done.
Our intelligent job search engine discovered this job and republished it for your convenience.
Please be aware that the job information may be incorrect or incomplete. The job announcement remains the property of its original publisher. To view the original job and its full details, please visit the job's URL on the owner’s page.
Please clearly mention that you have heard of this job opportunity on https://ijob.am.