
Data Engineer
- Bayan Lepas, Pulau Pinang
- Permanent
- Full-time
We are looking for a highly skilled and results-driven Data Engineer to join our dynamic data engineering team.
In this role, you will be responsible for designing, building, and maintaining real-time data pipelines and high-performance ETL workflows across cloud platforms such as AWS and Azure.
You will play a key role in managing large-scale data integration, ensuring data quality and security, and enabling advanced analytics for telecom and financial domains.Key Responsibilities:
- Design and deploy scalable, real-time data streaming pipelines using AWS Glue, Kinesis, and Teradata VantageCloud Lake.
- Build and maintain robust ETL workflows using Azure Data Factory and Synapse Analytics to process telecom and financial data exceeding 10TB.
- Optimize SQL performance with advanced techniques like secondary indexing, partition pruning, and temporary table usage in Teradata and Synapse.
- Automate data validation using Python to enforce schema integrity, null handling, and transformation audits.
- Implement data security measures including masking and role-based access control (RBAC) to ensure compliance with GDPR and internal policies.
- Develop web scraping tools using Selenium and BeautifulSoup to ingest semi-structured and dynamic web data into ETL pipelines.
- Automate deployment and server-side operations using Bash scripting, reducing manual intervention and streamlining engineering workflows.
- Collaborate with cross-functional teams to implement and monitor data solutions using CI/CD pipelines and modern orchestration tools.
CDN Data Processing Pipeline: Ingested and processed over 500GB daily using AWS Lambda, Kinesis, and Spark SQL with real-time monitoring and auto-error recovery.LLM-Powered QA Validator: Designed an AI-based quality assurance pipeline using LangChain, vector stores, and LLM evaluation to reduce manual QA efforts by 60%.Smart Migration Assistant: Built an LLM-based code migration tool to convert legacy workflows (Informatica, SAS) into modern frameworks like DBT and PySpark.Qualifications:
- Bachelor’s degree in Computer Science or a related field from a reputed university.
- 2+ years of experience in data engineering or backend development with hands-on expertise in cloud-native ETL systems.
- Strong programming skills in Python and SQL with a deep understanding of data structures and algorithms.
- Proficiency in working with AWS (Glue, Kinesis, Lambda), Azure (ADF, Synapse), and Teradata VantageCloud.
- Experience with LLM tools like LangChain, vector databases, and building AI-powered automation systems is a plus.
- Knowledge of data orchestration and DevOps tools like Airflow, Docker, Git, Kubernetes, Terraform, and Jenkins.
- Exposure to GCP ecosystem (BigQuery, Dataflow).
- Understanding of data governance, compliance standards, and secure data handling (PII, GDPR).
- Familiarity with REST API integration and event-driven data architectures.
- Work with cutting-edge technologies in a cloud-first, AI-augmented environment.
- Contribute to high-impact projects for global telecom and financial clients.
- Flexible work arrangements and a strong culture of learning and innovation.