
Tabby
Senior Observability and Monitoring Engineer
- Permanent
- Dubai, United Arab Emirates
- Experience 2 - 5 yrs
Job expiry date: 29/05/2026
Job overview
Date posted
14/04/2026
Location
Dubai, United Arab Emirates
Salary
Undisclosed
Compensation
Comprehensive package
Job description
Senior Observability and Monitoring Engineer responsible for designing, operating, and evolving company-wide observability platforms within a high-scale fintech environment. The role supports a strategic migration from Datadog-based observability systems (logs, metrics, APM, RUM, dashboards, alerts) to a self-hosted observability stack centered on Elastic Enterprise, ensuring full compliance with regulatory requirements including audit, access, application, and database logging. The position involves managing Elastic Enterprise clusters with focus on index lifecycle management, scaling, retention, access control, and backups, as well as building and maintaining log ingestion pipelines using Fluentd, Fluent Bit, Logstash, and Beats. The engineer collaborates with SRE, DevOps, and Security teams to ensure observability systems integrate with SOC2 compliance frameworks and SIEM platforms. The role includes defining SLIs, SLOs, and error budgets to improve system reliability, and implementing APM and metrics solutions using tools such as Prometheus, VictoriaMetrics, Mimir, Grafana Loki, and Grafana Tempo. Responsibilities also include infrastructure automation using Terraform, Helm, and GitOps tools like FluxCD and ArgoCD. The role requires strong experience in observability system design, monitoring at scale, and distributed system visibility, while supporting gradual decommissioning of legacy Datadog infrastructure. The engineer contributes to architecture documentation, system design, data flow mapping, and observability best practices across a global fintech organization handling large-scale transaction systems and compliance-driven logging requirements.
Required skills
Key responsibilities
- Operate and maintain Elastic Enterprise clusters including index lifecycle management, scaling, retention policies, access control, and backup operations
- Design and manage centralized log ingestion pipelines using Fluentd, Fluent Bit, Logstash, and Beats for applications, databases, and infrastructure systems
- Support migration and decommissioning of Datadog observability components including logs, metrics, APM, RUM, dashboards, and monitoring alerts
- Collaborate with SRE, DevOps, and Security teams to integrate observability systems with SOC2 compliance frameworks and SIEM solutions
- Define, implement, and maintain SLIs, SLOs, and error budgets to improve service reliability and system visibility
- Build and optimize metrics and APM platforms using Prometheus, VictoriaMetrics, Mimir, Grafana Loki, and Grafana Tempo
- Automate observability infrastructure using Terraform, Helm, and GitOps workflows including FluxCD and ArgoCD
- Document observability architecture, data flows, system design, and operational best practices across platforms
Experience & skills
- 4+ years of experience in DevOps, SRE, or Observability engineering roles
- Hands-on experience with Elastic Stack components including Elasticsearch, Logstash, Kibana, and Beats
- Experience building or operating APM and metrics systems using tools such as Prometheus, VictoriaMetrics, or Mimir
- Exposure to SIEM systems and security monitoring integrations including SOC2 compliance environments
- Experience with Grafana ecosystem tools including Loki and Tempo
- Knowledge of infrastructure as code tools such as Terraform and Helm
- Experience with GitOps workflows and tools such as FluxCD or ArgoCD
- Familiarity with Kubernetes and cloud platforms such as AWS, GCP, or OCI