Overview

At BBG Group, we are committed to building and maintaining high-performance, reliable infrastructure to support mission-critical systems 24/7. Our team ensures stability, scalability, and security, driving the foundation of our technology operations.

We are looking for a Site Reliability Engineer (System Administrator) to join our team. This role requires expertise in virtualization, containerization, monitoring, and incident response. If you are passionate about system reliability and thrive in a fast-paced environment, we’d love to hear from you!

 

 

Responsibilities:
WHAT YOU WILL DOEnsure 24/7 System Reliability
  • Monitor and maintain the stability of critical infrastructure during assigned shifts.
  • Analyze and optimize the performance of databases (MySQL, PostgreSQL, MongoDB).
  • Administer and manage virtual machines using KVM and Proxmox.
  • Deploy and support containerized applications using Docker and Kubernetes.
Monitor & Respond to Incidents
  • Track and analyze system health using monitoring tools (Nagios, Zabbix).
  • Process and evaluate logs with the ELK Stack (Elasticsearch, Logstash, Kibana).
  • Respond to and resolve infrastructure incidents, minimizing system downtime.
  • Maintain and update incident logs to enhance troubleshooting efficiency.
Optimize Infrastructure & Performance
  • Ensure continuous operation of critical services and applications.
  • Implement and maintain backup and recovery strategies for databases and systems.
  • Identify and address performance bottlenecks to improve efficiency.
Collaborate & Improve Processes
  • Work closely with engineering teams to refine monitoring and incident response workflows.
  • Document best practices and provide recommendations for automation and efficiency.
  • Share knowledge with team members, fostering continuous improvement.
Required Qualifications:
SKILLS TO DO YOUR JOB EFFICIENTLYTechnical Expertise & System Administration
  • Strong experience with virtualization (KVM, Proxmox) and containerization (Docker, Kubernetes).
  • Hands-on expertise with monitoring and logging tools (Nagios, Zabbix, ELK Stack).
  • Knowledge of database administration, including performance analysis and recovery.
  • Solid understanding of network technologies and Linux/Unix system administration.
Incident Response & Problem-Solving
  • Ability to diagnose and resolve system failures under high-pressure conditions.
  • Strong troubleshooting skills to handle real-time incidents and service outages.
  • Experience in developing and maintaining disaster recovery plans.
Collaboration & Work Environment
  • Strong attention to detail, responsibility, and team-oriented mindset.
  • Ability to multitask, prioritize incidents, and work effectively in a fast-paced environment.
  • Willingness to work 12-hour shift schedules (day/night) to support 24/7 operations.
Additional Information:
Location: Yerevan, (Hybrid) Contact: +374 41 100029 Telegram - @achevardanian Email - [email protected]

Please clearly mention that you have heard of this job opportunity on https://ijob.am.