Data Engineer( Kubernetes-AWS)

  • Petaling Jaya, Selangor
  • RM 10,000-12,000 per month
  • Permanent
  • Full-time
  • 1 month ago
Get to Know the Team:- Grab Data Engineers are key contributors to Southeast Asia's extensive data ecosystem, enabling growth through accessible data. We maintain Grab's data infrastructure, a dependable and economical platform that supports internal data processes and company-wide data lake access. This includes managing compute systems such as Apache Spark, Trino, and Starrocks, scheduling via Airflow, and an AWS S3-based storage layer. Our storage solutions utilize modern open-source formats like Apache Iceberg and Delta, in addition to traditional Apache Hive Parquet tables. We simplify data operations for internal users by providing managed Spark, Starrocks, Trino, and Airflow services. Furthermore, our team collaborates with Grab's Data Catalog team to offer self-service datalake capabilities powered by Datahub. We also partner closely with Grab's AI platforms, ensuring a smooth experience for users in their AI development. Get to Know the Role:- You will support the mission of the team by maintaining and extending the platform capabilities through implementation of new features and continuous improvements. You will also explore new developments in the space and continuously bring them to our platform there by helping the data community at Grab The Critical Tasks You Will Perform You will maintain and extend the Python/Go/Scala backend for Grab's Airflow, Spark, Trino and Starrocks platform You will modify and extend Python/Scala Spark applications and Airflow pipelines for better performance, reliability, and cost. You will design and implement architectural improvements for new use cases or efficiency. You will build platforms that can scale to the 3 Vs of Big Data (Volume, Velocity, Variety) You will follow various testing best practices and SRE best practices to ensure system stability and reliability. Qualifications:- What Essential Skills You Will Need Software Engineering, Computer Science, or related undergraduate degree. Proficient in at least one of the following: Python, Go, or Scala and strong appetite to learn other programming languages. You have 3-5 years of relevant professional experience Good working knowledge in 3 or more of the following: Airflow, Spark, relational databases (ideally MySQL), Kubernetes, Starrocks, Trino, and backend API implementation and being passionate about learning the others. Experience with AWS services (S3, EKS, IAM) and infrastructure as code tools like Terraform. Proficiency in CI/CD tools (Jenkins, GitLab, etc.) You are highly motivated to work smart and intelligently using available AI resources at Grab MUST HAVES Proficient in Kubernetes with hands of experience with building custom resources using frameworks like kubebuilder. Proficient in Apache Spark, with good knowledge of resource managers like Yarn, Kubernetes and how spark interacts and work with them Advanced understanding of Apache Airflow and its working with Celery and/or Kubernetes executor backend with exposure to Python SQLAlchemy framework. Advanced knowledge of other query engines like Trino, Starrocks and others Advanced knowledge of AWS Cloud Good understanding of lakehouse table formats like Iceberg and Delta lake, how query engines work with it.

foundit