
Lead I - Data Engineerings
- Bayan Lepas, Pulau Pinang
- Permanent
- Full-time
- Design, build, and maintain robust data pipelines using Azure Data Factory (ADF) to ingest data from diverse sources including APIs, Excel files, Azure Data Lake Storage Gen2 (ADLS Gen2), and SharePoint, ensuring secure access and efficient data flow.
- Configure and manage SharePoint integration within ADF pipelines, handling authentication and access criteria effectively.
- Develop complex data transformations using ADF Mapping Data Flows, applying best practices to create scalable, maintainable ETL/ELT workflows.
- Implement and champion metadata-driven architecture for enhanced scalability, modularity, and reusability of data pipelines.
- Perform rigorous data validation and testing to maintain high data quality and accuracy across ingestion and transformation stages.
- Build dimensional models and apply data warehousing concepts to structure data optimally for analytics and reporting.
- Leverage Azure Databricks for big data processing, data engineering, and advanced analytics use cases.
- Use REST APIs effectively to source data, handle pagination, authentication, and error handling.
- Design and optimize data storage and analytical solutions using Azure Synapse Analytics, balancing performance and cost.
- Apply core data engineering concepts, including ETL/ELT design, pipeline orchestration, incremental loads, and monitoring.
- Understand and contribute to Azure Fabric architecture for scalable distributed systems (advantageous).
- Utilize Azure DevOps for CI/CD pipeline creation and management, enabling automated deployment and version control of data solutions (plus).
- Develop serverless event-driven solutions using Azure Functions to support data workflows and processing tasks (plus).
- Implement automated workflows and notifications using Azure Logic Apps integrated with data pipelines (plus).
- Collaborate with cross-functional teams to understand business requirements and translate them into scalable technical data solutions.
- Monitor and troubleshoot production pipelines to ensure data availability, reliability, and performance.
- Azure Data Factory (ADF) — Pipeline orchestration, data ingestion, Mapping Data Flows
- Azure Synapse Analytics — Data warehousing, big data analytics, serverless SQL pools
- Azure Data Lake Storage Gen2 (ADLS Gen2) — Scalable data lake storage
- Azure Databricks — Apache Spark-based big data processing and analytics
- Azure Logic Apps — Workflow automation and integration (email notifications, s)
- Azure Functions — Serverless compute for event-driven processing
- Azure DevOps — CI/CD pipeline automation and source control
- REST APIs — Data ingestion from external APIs including pagination and authentication
- SharePoint Integration — Secure access and data ingestion from SharePoint sources
- Metadata-driven Architecture — Scalable and reusable pipeline design methodology
- Data Warehousing Concepts — Dimensional modeling, star schema, snowflake schema
- Data Engineering Concepts — ETL/ELT, pipeline design, incremental load, data validation
- Azure Fabric (plus) — Distributed systems and microservices platform