PySpark Data Engineer
by Virtusa in Data Science & Analytics
The role focuses on designing and implementing large-scale end-to-end Data Management and Analytics solutions across complex organizations, with deep expertise in data architecture, data strategy, and the transformation of traditional data warehouses into modern Big Data platforms. The position requires hands-on experience in big-data processing frameworks such as Hadoop, Presto, Tez, Hive, and Spark, with strong proficiency in PySpark, Python, Linux, GIT, and Jenkins. The engineer must utilize DW dimensional modeling techniques including star and snowflake schemas, slowly changing dimensions, role-playing dimensions, dimensional hierarchies, and data classification. Responsibilities further include working with cloud-native principles, enhancing CI/CD environments, ensuring robust data quality, profiling, governance, security, metadata management, and archival practices. The role requires defining workload migration strategies, driving delivery in a matrixed environment, managing risks, ensuring data security, and handling simultaneous tasks under tight deadlines. A self-starter mindset with excellent problem solving, communication, influencing, and presentation skills is essential, along with the ability to work independently and produce strategic planning, estimation, and scheduling outputs.