
Data Engineer (Analytics)
- Cyberjaya, Selangor
- Permanent
- Full-time
- Experience in data warehouse technologies, data modeling and ETL development
- Experience in a high-performing, large-scale technology driven environment
- Exposure in creating/maintaining Data pipeline (Apache Spark or similar) and workflow tools (Airflow or similar) for both Realtime and batch use cases. Cloud experience (Azure is preferred)
- Basic understanding of ML, Deep learning, Wrappers and APIs
- Experience with Big Data ML toolkits, such as Mahout, SparkML, or H2O (Mandatory)
- Strong coding skill in SQL and (Python or Java or Scala)
- Exposure to Database architecture using RDBMS or NoSQL: Views/Tables, Disk usage and relational diagram
- Excellent communication, interpersonal skills, and proven project management skills.
- Management of Hadoop cluster, with all included services, and ability to solve any ongoing issues with operating the cluster
- Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming (Good to have)
- Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala
- Experience with integration of data from multiple data sources
- Knowledge of various ETL techniques and frameworks, such as Flume
- Experience with various messaging systems, such as Kafka or RabbitMQ
- Good understanding of Lambda Architecture, along with its advantages and drawbacks
- Experience with Cloudera/MapR/Hortonworks
- The role sits within our Decision Analytics, one of our four Global Business Lines.
- Experience in data warehouse technologies, data modeling and ETL development
- Experience in a high-performing, large-scale technology driven environment
- Exposure in creating/maintaining Data pipeline (Apache Spark or similar) and workflow tools (Airflow or similar) for both Realtime and batch use cases. Cloud experience (Azure is preferred)
- Basic understanding of ML, Deep learning, Wrappers and APIs
- Experience with Big Data ML toolkits, such as Mahout, SparkML, or H2O (Mandatory)
- Strong coding skill in SQL and (Python or Java or Scala)
- Exposure to Database architecture using RDBMS or NoSQL: Views/Tables, Disk usage and relational diagram
- Excellent communication, interpersonal skills, and proven project management skills.
- Management of Hadoop cluster, with all included services, and ability to solve any ongoing issues with operating the cluster
- Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming (Good to have)
- Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala
- Experience with integration of data from multiple data sources
- Knowledge of various ETL techniques and frameworks, such as Flume
- Experience with various messaging systems, such as Kafka or RabbitMQ
- Good understanding of Lambda Architecture, along with its advantages and drawbacks
- Experience with Cloudera/MapR/Hortonworks