Overview
CloudLinux is transforming the Linux infrastructure market by ensuring security and stability for over 500,000 servers worldwide. Our products – CloudLinux OS, TuxCare, and Imunify360 – are the de facto standard in the hosting industry and Enterprise s…
Responsibilities:
- DBaaS Architecture: Design and implement a self-service platform based on Terraform and Ansible, enabling the deployment of HA clusters (PostgreSQL and ClickHouse, MongoDB, Redis) in a heterogeneous environment (Bare Metal + OpenNebula + Kubernetes + Public Clouds). You will turn infrastructure into a product.
- Scaling ClickHouse: Manage exponentially growing analytics clusters (12+ clusters, tens of terabytes of data). You will tackle sharding, table engine optimization (ReplicatedMergeTree), and building reliable S3 backup pipelines under high load.
- Data Platform & Analytics Support: Maintain and scale the infrastructure for Apache Airflow and Redash. You will ensure the reliability of ETL pipelines and visualization tools, bridging the gap between raw infrastructure and the data analytics team.
- Reliability as Code: Implement SRE practices in data management. Replace manual incident response with automated self-healing mechanisms. Define and implement SLO/SLI for all databases.
- Stack Modernization: Lead the migration process from legacy solutions to modern cloud patterns. Participate in decision-making regarding the implementation of Kubernetes operators for stateful workloads.
- Expertise & Mentorship: Serve as the technical authority for product teams, helping them optimize data schemas and SQL queries for high-load systems.
Nice To Have:
- Experience building an Internal Developer Platform (IDP).
- Experience operating databases in Kubernetes (CloudNativePG, Altinity Operator).
- Experience working in Cloud and Hosting providers on similar services.
Required Qualifications:
- Deep PostgreSQL Expertise (5+ years): You know MVCC internals, understand locking mechanics, can configure Patroni and PgBouncer "with your eyes closed," and have experience with seamless major version upgrades under load.
- ClickHouse Mastery: Experience operating large clusters, understanding ZooKeeper/ClickHouse Keeper, sharding, replication internals, and the ability to diagnose performance issues at the data-part level.
- Engineering Mindset (SRE/DevOps): You hate doing the same task twice by hand. Experience writing complex Terraform modules and Ansible roles is mandatory. Programming skills in Python or Go for automation are a huge plus.
- Hybrid Environment Experience: You understand the differences between running DBs on Bare Metal vs. Kubernetes vs. Cloud and know how to optimize TCO and disk subsystem performance (NVMe, Network Storage).
- Systems Approach: You see the big picture - from the network packet to the application business logic. You understand the importance of security (FIPS, Audit logs) and Disaster Recovery.
Note:
✨ Our intelligent job search engine discovered this job and republished it for your convenience.
Please be aware that the job information may be incorrect or incomplete. The job announcement remains the property of its original publisher. To view the original job and its full details, please visit the job's URL on the owner’s page.
Please clearly mention that you have heard of this job opportunity on https://ijob.am.

