
Whiteshield
IT Infrastructure Administrator / Reliability Engineer
- Permanent
- Dubai, United Arab Emirates
- Experience 2 - 5 yrs
Job expiry date: 03/06/2026
Job overview
Date posted
19/04/2026
Location
Dubai, United Arab Emirates
Salary
AED 15,000 - 20,000 per month
Compensation
Salary only
Job description
The IT Infrastructure Administrator / Reliability Engineer role operates and secures an on-premise Kubernetes platform based on VMware Tanzu to support critical public digital services within a UAE Federal Government environment. The role focuses on ensuring platform reliability, availability, security hardening, and operational stability of containerized workloads running in production. Responsibilities include administering Kubernetes clusters and managing container orchestration systems while maintaining core supporting infrastructure components such as Harbor for container registry management, HashiCorp Vault for secrets management, Redis for caching and in-memory data services, and NGINX for load balancing and traffic routing. The position requires continuous monitoring of systems and logs using the ELK stack (Elasticsearch, Logstash, Kibana) to ensure performance visibility, incident detection, and operational troubleshooting. A key responsibility is leading infrastructure security hardening initiatives, including vulnerability remediation, system configuration strengthening, and adherence to security best practices in on-prem environments. The role also involves supporting CI/CD pipelines using Azure DevOps to ensure automated, secure, and reliable application delivery into Kubernetes environments. Additionally, the engineer ensures high availability architecture, backup strategies, and disaster recovery processes are properly implemented and maintained to support mission-critical public services. The position requires strong expertise in maintaining secure, scalable, and resilient infrastructure systems in regulated government environments with a strong emphasis on operational reliability and cybersecurity posture.
Required skills
Key responsibilities
- Administer and operate on-premise Kubernetes (VMware Tanzu) clusters and manage containerized workloads supporting critical public digital services
- Manage and maintain core infrastructure components including Harbor container registry, HashiCorp Vault secrets management, Redis caching systems, and NGINX load balancing services
- Monitor system performance, logs, and infrastructure health using the ELK stack (Elasticsearch, Logstash, Kibana) to ensure operational stability and incident detection
- Lead infrastructure security hardening initiatives including vulnerability remediation, configuration strengthening, and enforcement of security best practices
- Support, maintain, and optimize CI/CD pipelines using Azure DevOps for secure and automated application deployments
- Ensure high availability architecture design and implementation across Kubernetes infrastructure and supporting systems
- Implement and maintain backup strategies and disaster recovery procedures to ensure system resilience and business continuity
- Support troubleshooting and resolution of infrastructure incidents across containerized environments and supporting services
- Maintain operational reliability of on-premise infrastructure supporting government digital services
- Collaborate with security and operations teams to ensure compliance with infrastructure security requirements
Experience & skills
- Demonstrate strong hands-on experience in Kubernetes administration and container orchestration
- Exhibit experience operating VMware Tanzu and managing on-premise infrastructure environments
- Possess practical experience with Harbor, HashiCorp Vault, Redis, NGINX, and ELK stack components
- Show proven experience in infrastructure security hardening and vulnerability remediation
- Demonstrate experience working with CI/CD pipelines using Azure DevOps
- Apply strong knowledge of high availability systems, backup solutions, and disaster recovery planning
- Operate and maintain secure and reliable on-prem Kubernetes environments supporting production workloads
- Manage monitoring and logging systems for infrastructure visibility and incident response
- Support secure deployment pipelines and infrastructure automation processes
- Work in mission-critical environments with a focus on reliability, security, and operational continuity