
Virtusa
AI/ML Engineer
- Permanent
- Dubai, United Arab Emirates
- Experience 5 - 10 yrs
Job expiry date: 19/09/2025
Job overview
Date posted
05/08/2025
Location
Dubai, United Arab Emirates
Salary
AED 20,000 - 30,000 per month
Compensation
Comprehensive package
Experience
5 - 10 yrs
Seniority
Experienced
Qualification
Bachelors degree
Expiration date
19/09/2025
Job description
We are looking for an AI/ML-focused Data Engineer with deep expertise in building intelligent data pipelines for unstructured content and integrating with modern machine learning ecosystems. The ideal candidate will be hands-on in PySpark and Python, and experienced in document classification, cleansing, and building AI-first applications using LLMs, vector databases, and RAG frameworks. This role bridges data engineering and machine learning across the enterprise.
Required skills
PySpark
Python
document classification
data cleansing
quality metrics
LLMs
LangChain
Transformers
Hugging Face
FAISS
vector databases
Redis
RAG frameworks
document chunking
metadata tagging
semantic search
OCR
NLP
CI/CD
agile methodologies
Key responsibilities
- Build robust, scalable data processing pipelines for unstructured documents using PySpark and Python.
- Implement document cleansing, classification, and enrichment to prepare data for AI/ML applications.
- Develop and integrate data workflows into LLM pipelines and support RAG architectures.
- Engineer vector embeddings, chunk documents, and apply metadata tagging for semantic search and QA systems.
- Collaborate with AI architects, data engineers, and platform teams to design end-to-end AI solutions.
- Communicate pipeline quality, data readiness, and model integration strategies to stakeholders.
- Apply Agile and CI/CD practices to continuously deliver AI capabilities.
Experience & skills
- 5+ years of commercial experience, with at least 2 years in a relevant AI/ML role.
- Strong proficiency in PySpark and distributed data frameworks.
- Solid experience in Python and ML/AI libraries (e.g., Transformers, LangChain, Hugging Face, FAISS).
- Expertise in processing unstructured data, including OCR, NLP, classification, and tagging.
- Familiarity with vector databases like Redis and embedding models for RAG pipelines.
- Understanding of the LLM lifecycle, including fine-tuning, inference, and prompt engineering.
- Experience in agile environments and working with cross-functional teams.
- Excellent communication skills with ability to engage technical and business stakeholders.