
Trapeze
Senior DevOps Engineer (Cloud & On-Prem Infrastructure)
- Permanent
- Dubai, United Arab Emirates
- Experience 5 - 10 yrs
Job expiry date: 11/03/2026
Job overview
Date posted
25/01/2026
Location
Dubai, United Arab Emirates
Salary
AED 20,000 - 30,000 per month
Compensation
Job description
The Senior DevOps Engineer (Cloud & On-Prem Infrastructure) is responsible for architecting, implementing, and operating highly reliable, scalable, and secure infrastructures across on-premises and cloud environments in Dubai. The role encompasses automation with Ansible, Foreman, Satellite, PXE/Kickstart, and Infrastructure-as-Code using Terraform. The engineer will build HA clusters, load balancers, and resilient storage for low-latency, high-throughput workloads. Responsibilities include establishing hardened RHEL baselines (CIS, SELinux, firewalld), lifecycle management, deep OS troubleshooting, capacity planning, and observability with Prometheus, Grafana, and ELK. The position requires optimizing Java runtime environments and app servers (JBoss/WildFly, Tomcat; WebLogic a plus), performing JVM tuning (G1/ZGC, heap sizing, thread pools), application profiling (JFR, Async-Profiler, APMs), load and performance testing (JMeter, Gatling, k6), and integrating automated penetration testing and vulnerability scanning (OWASP ZAP, Burp Suite, Snyk) into CI/CD pipelines. Containerization with Docker/Podman and deployment on Kubernetes/OpenShift, hybrid CI/CD pipelines (Jenkins/Azure DevOps), artifact management (Nexus/Artifactory), SBOMs, and image signing are included. Cloud architecture knowledge for Azure, AWS, and GCP, including hybrid connectivity (ExpressRoute, Direct Connect, Cloud Interconnect, VPNs), identity, networking, security, logging, and cost governance, is essential. The engineer leads incident response, post-mortems, MTTR reduction, reliability improvements, and implements secrets management, PKI rotation, and least-privilege access while ensuring compliance. Occasional travel within the Middle East and Europe is required.
Required skills
Key responsibilities
- Automate on-premises infrastructure at scale using Ansible, Foreman, Satellite, PXE/Kickstart, and Terraform
- Build high-availability clusters, load balancers, and resilient storage systems for low-latency, high-throughput workloads
- Establish hardened RHEL baselines including CIS, SELinux, and firewalld; manage lifecycle and patching including EUS and kernel upgrades
- Perform deep OS troubleshooting covering systemd, cgroups, I/O schedulers, and NIC offloads; conduct capacity planning
- Implement observability and monitoring using Prometheus, Grafana, ELK, and SLO/SLI-driven alerting with actionable runbooks
- Operate and optimize Java application servers (JBoss/WildFly, Tomcat; WebLogic a plus) and tune JVM settings with profiling tools (JFR, Async-Profiler, Dynatrace, New Relic, AppDynamics)
- Lead automated load testing and performance benchmarking using JMeter, Gatling, and k6
- Integrate automated penetration testing and vulnerability scanning (OWASP ZAP, Burp Suite, Snyk) into CI/CD pipelines
- Engineer zero-downtime deployments including blue/green and canary strategies with connection pool and GC tuning
- Containerize Java services with Docker/Podman and deploy on-prem Kubernetes/OpenShift with operators, quotas, network policies, and storage classes
- Build and maintain CI/CD pipelines using Jenkins and Azure DevOps with artifact management, SBOM generation, image signing, and policy enforcement
- Design, deploy, and operate workloads across Azure, AWS, and GCP with appropriate services and governance for identity, networking, security, logging, and cost
- Implement hybrid connectivity between on-prem and cloud estates using ExpressRoute, Direct Connect, Cloud Interconnect, and site-to-site VPNs
- Lead incident response, conduct post-mortems, improve reliability, and drive MTTR reduction
- Implement secrets management, PKI/certificate rotation, and least-privilege access controls; collaborate on audits and compliance
Experience & skills
- Minimum 8 years of hands-on DevOps or Site Reliability Engineering experience
- Strong Java runtime expertise including JVM and GC tuning, thread and heap diagnostics, and application server operations (JBoss/WildFly, Tomcat)
- Experience operating on-premises infrastructure including VMWare
- DevOps certification in Azure, AWS, or GCP is an added advantage
- Proficiency in automation tools and Infrastructure-as-Code (Ansible, Terraform, Foreman, PXE/Kickstart)
- Experience with containerization and orchestration platforms (Docker, Podman, Kubernetes, OpenShift)
- Experience with CI/CD tools and artifact management systems (Jenkins, Azure DevOps, Nexus, Artifactory)
- Hands-on experience with monitoring, observability, and performance profiling tools (Prometheus, Grafana, ELK, Dynatrace, New Relic, AppDynamics, JFR, Async-Profiler)
- Experience implementing automated security testing (OWASP ZAP, Burp Suite, Snyk) within pipelines
- Ability to build hybrid cloud connectivity and manage Azure, AWS, and GCP workloads including networking, identity, security, logging, and cost governance
- Strong troubleshooting and problem-solving skills in multi-faceted environments
- Willingness to travel occasionally within the Middle East and Europe