G42

Senior Engineer – Site Reliability

Permanent
Abu Dhabi, United Arab Emirates
Experience 5 - 10 yrs

Apply now

View more jobs like this

Job expiry date: 14/12/2025

Return to jobs page

Job overview

Date posted
30/10/2025
Location
Abu Dhabi, United Arab Emirates
Salary
Undisclosed
Compensation
Comprehensive package
Experience
5 - 10 yrs
Seniority
Senior & Lead
Qualification
Bachelors degree
Expiration date
14/12/2025

Job description

The Senior Engineer – Site Reliability at Presight will be responsible for enhancing the performance, availability, and scalability of Presight’s advanced data and AI infrastructure. This hands-on role focuses on designing, deploying, and optimizing CI/CD pipelines, cloud infrastructure, and DevOps automation to ensure high system reliability across Presight’s platforms. The position requires strong technical expertise in Azure, Kubernetes, Linux systems, and automation tools such as Terraform, Ansible, and Helm. The Senior Engineer will also lead incident response, implement service monitoring frameworks, and drive continuous improvement across deployment and operational processes. This role is ideal for a performance-driven professional passionate about AI-driven systems and large-scale distributed architecture.

Required skills

site reliability engineering (SRE)

DevOps automation

cloud infrastructure (Azure, OpenStack, OpenShift)

Kubernetes

Docker

Terraform

Ansible

Helm

CI/CD pipeline (GitLab, Jenkins, ArgoCD)

Linux systems administration

monitoring tools (ELK, Prometheus, Grafana, Zabbix)

Python scripting

Shell scripting

distributed systems architecture

networking and cloud security

MySQL

PostgreSQL

MongoDB

MSSQL

HashiCorp Vault

incident management

performance optimization

data analytics (Elasticsearch, Hadoop, Opensearch)

automation frameworks

service availability and scalability

communication and stakeholder management

Key responsibilities

Design, deploy, and maintain highly reliable, scalable, and secure infrastructure supporting Presight’s AI-driven analytics platforms
Lead incident response efforts, troubleshoot high-priority production issues, and ensure minimal downtime across systems
Develop and optimize CI/CD pipelines using GitLab, Jenkins, and ArgoCD to improve deployment frequency and service stability
Automate infrastructure provisioning and configuration management using Terraform, Ansible, and Helm
Enhance system observability through monitoring and logging tools such as ELK, Prometheus, Grafana, and Zabbix
Collaborate with development and operations teams to improve service delivery, performance, and fault tolerance
Architect and implement resilient, cloud-native solutions using Azure and private cloud environments (OpenStack, OpenShift)
Manage big data environments including Elasticsearch, Hadoop, and Opensearch clusters
Monitor service-level objectives (SLOs) and implement resiliency patterns for mission-critical systems
Ensure adherence to IT governance, information security, and compliance standards
Perform database administration and optimization across MySQL, PostgreSQL, MongoDB, and MSSQL
Implement secrets management practices using HashiCorp Vault or equivalent tools
Drive continuous improvement initiatives to streamline processes and enhance infrastructure efficiency
Collaborate with internal stakeholders and customers to align infrastructure improvements with business goals
Document system architectures, procedures, and recovery protocols to ensure operational transparency

Experience & skills

Bachelor’s degree in Computer Science, Engineering, or a related discipline
5–7 years of hands-on experience in Site Reliability Engineering or DevOps within large-scale distributed systems
Strong hands-on experience with Azure Cloud, Linux administration, and Kubernetes orchestration
Proficiency with automation tools such as Ansible, Terraform, and Helm
Experience building and maintaining CI/CD pipelines using GitLab, Jenkins, and ArgoCD
Familiarity with big data technologies (Hadoop, Elasticsearch, Opensearch) and large-scale data processing environments
Hands-on experience with system monitoring and performance optimization tools (ELK, Prometheus, Grafana, Zabbix)
Solid understanding of cloud networking, information security, and compliance frameworks
Experience managing databases including MySQL, PostgreSQL, MongoDB, and MSSQL
Proficiency in scripting languages (Python, Shell) for automation and problem-solving
Excellent analytical, troubleshooting, and communication skills
Ability to work independently and collaboratively in fast-paced, mission-critical environments
Strong customer focus, problem-solving mindset, and passion for continuous improvement

Apply now

Return to jobs page

Share this post