
G42
Senior Engineer â Site Reliability
- Permanent
- Abu Dhabi, United Arab Emirates
- Experience 5 - 10 yrs
Job expiry date: 14/12/2025
Job overview
Date posted
30/10/2025
Location
Abu Dhabi, United Arab Emirates
Salary
Undisclosed
Compensation
Comprehensive package
Experience
5 - 10 yrs
Seniority
Senior & Lead
Qualification
Bachelors degree
Expiration date
14/12/2025
Job description
The Senior Engineer â Site Reliability at Presight will be responsible for enhancing the performance, availability, and scalability of Presightâs advanced data and AI infrastructure. This hands-on role focuses on designing, deploying, and optimizing CI/CD pipelines, cloud infrastructure, and DevOps automation to ensure high system reliability across Presightâs platforms. The position requires strong technical expertise in Azure, Kubernetes, Linux systems, and automation tools such as Terraform, Ansible, and Helm. The Senior Engineer will also lead incident response, implement service monitoring frameworks, and drive continuous improvement across deployment and operational processes. This role is ideal for a performance-driven professional passionate about AI-driven systems and large-scale distributed architecture.
Required skills
Key responsibilities
- Design, deploy, and maintain highly reliable, scalable, and secure infrastructure supporting Presightâs AI-driven analytics platforms
- Lead incident response efforts, troubleshoot high-priority production issues, and ensure minimal downtime across systems
- Develop and optimize CI/CD pipelines using GitLab, Jenkins, and ArgoCD to improve deployment frequency and service stability
- Automate infrastructure provisioning and configuration management using Terraform, Ansible, and Helm
- Enhance system observability through monitoring and logging tools such as ELK, Prometheus, Grafana, and Zabbix
- Collaborate with development and operations teams to improve service delivery, performance, and fault tolerance
- Architect and implement resilient, cloud-native solutions using Azure and private cloud environments (OpenStack, OpenShift)
- Manage big data environments including Elasticsearch, Hadoop, and Opensearch clusters
- Monitor service-level objectives (SLOs) and implement resiliency patterns for mission-critical systems
- Ensure adherence to IT governance, information security, and compliance standards
- Perform database administration and optimization across MySQL, PostgreSQL, MongoDB, and MSSQL
- Implement secrets management practices using HashiCorp Vault or equivalent tools
- Drive continuous improvement initiatives to streamline processes and enhance infrastructure efficiency
- Collaborate with internal stakeholders and customers to align infrastructure improvements with business goals
- Document system architectures, procedures, and recovery protocols to ensure operational transparency
Experience & skills
- Bachelorâs degree in Computer Science, Engineering, or a related discipline
- 5â7 years of hands-on experience in Site Reliability Engineering or DevOps within large-scale distributed systems
- Strong hands-on experience with Azure Cloud, Linux administration, and Kubernetes orchestration
- Proficiency with automation tools such as Ansible, Terraform, and Helm
- Experience building and maintaining CI/CD pipelines using GitLab, Jenkins, and ArgoCD
- Familiarity with big data technologies (Hadoop, Elasticsearch, Opensearch) and large-scale data processing environments
- Hands-on experience with system monitoring and performance optimization tools (ELK, Prometheus, Grafana, Zabbix)
- Solid understanding of cloud networking, information security, and compliance frameworks
- Experience managing databases including MySQL, PostgreSQL, MongoDB, and MSSQL
- Proficiency in scripting languages (Python, Shell) for automation and problem-solving
- Excellent analytical, troubleshooting, and communication skills
- Ability to work independently and collaboratively in fast-paced, mission-critical environments
- Strong customer focus, problem-solving mindset, and passion for continuous improvement