JOB DESCRIPTION We are seeking a dedicated professional with 4 to 6 years of experience in managing data centre operations and observability platforms. The role involves overseeing daily data centre activities including batch run management, access and asset control, and process enhancement. The candidate will supervise shift-based operators and ensure compliance with risk, audit, and regulatory standards while working closely with stakeholders and the Data Centre Manager. Additionally, the role includes responsibilities in managing end-to-end observability platforms across the IT stack. This includes configuring and optimising monitoring tools, collaborating with cross-domain teams to gather requirements, and delivering actionable insights. Flexibility to work outside regular office hours and strong attention to detail are essential. JOB RESPONSIBILITIES Manage and monitor daily batch run operations Oversee scheduling, execution, and monitoring of batch jobs. Troubleshoot and resolve batch failures or delays. Coordinate with application teams to ensure timely job completion. Implement and enforce access control and asset movement policies Manage physical and logical access to the data centre and maintain access logs to ensure compliance with security protocols. Maintain accurate inventory of hardware and equipment. Coordinate with security teams for access and asset movement audits and reviews. Enhance and document operational processes and procedures Develop and maintain SOPs (Standard Operating Procedures). Ensure procedures are aligned with the industry's best practices and internal policies. Ensure adherence to data centre compliance frameworks and standards. Support and coordinate a team of Data Centre Operators working in shifts Provide guidance, training, and support to shift-based operators. Ensure smooth handovers between shifts and maintain operational continuity. Monitor team performance and assist in scheduling and resource planning. Maintain awareness and support of modern data centre operations Stay updated on emerging technologies and operational trends. Support initiatives related to energy efficiency, sustainability, and automation. Participate in modernisation projects and capacity planning. Understand and support Data Centre M&E (Mechanical & Electrical) infrastructure Coordinate with facilities teams on power, cooling, and environmental systems. Respond to M&E alerts and escalate issues as needed. Participate in maintenance windows and infrastructure upgrades. Manage branch communication rooms across the country Conduct regular inspections and maintenance of branch communication rooms, visit branch sites to perform audits, inspections, and support activities. Ensure all rooms comply with operational and security standards. Coordinate with branch IT teams for upgrades and issue resolution. Manage, configure, and fine-tune observability platforms Administer observability tools (e.g., Grafana, Prometheus, Splunk, Dynatrace, etc.). Optimise data collection, storage, and visualisation for performance, implement custom metrics, logs, and traces to enhance observability coverage. Identify gaps in monitoring and propose solutions to improve coverage. Collaborate with cross-domain stakeholders for requirements gathering Engage with infrastructure, application, and security teams to understand observability needs. Translate business and technical requirements into platform configurations. Document and communicate observability solutions and outcomes. Deliver observability outcomes that support operational excellence Develop actionable insights from observability data. Support root cause analysis and post-incident reviews. Contribute to continuous improvement initiatives based on observability findings. Perform administrative task, including procurement and tendering Prepare technical specifications and requirements for procurement. Manage procurement/tendering activities, documentation and coordinate with procurement teams to ensure the timely delivery of equipment and services. Maintain accurate records of procurement activities and monitor vendor performance for compliance and quality. JOB REQUIREMENTS Possess a bachelor's degree in computer science/ information technology or equivalent which is recognized by the Government from any local or abroad higher learning institution with a minimum CGPA of 3.00. Minimum 4 years of experience in data centre operations or infrastructure support and managing observability tools and platforms (e.g., Grafana, Prometheus, Splunk, Dynatrace). Strong knowledge of batch job management tools and scheduling systems (e.g., WLA, Paymate). Familiarity with access control systems and asset management tools. Understanding of data centre M&E systems (e.g., UPS, HVAC, fire suppression). Experience working in a shift-based operational environment. Programming skills in languages such as Python, Bash, or PowerShell for automation and integration. Awareness of compliance, risk, and audit frameworks (e.g., ISO 27001, ISO/IEC 20000, DCRA, PCI-DSS, Data Centre Standard TIA/Uptime). Familiarity with ITIL-aligned service delivery practices Willingness to work during non-standard office hours (e.g., weekends, public holidays, wee hours of the day) and willingness and ability to travel across the country. Strong documentation and communication skills. Ability to work independently and as part of a team. Ability to work effectively with all levels of the organisation. Malaysian citizen. Obtain a pass in Bahasa Melayu, including an oral test in Sijil Pelajaran Malaysia (SPM) level or equivalent qualification recognised by the Government. COMPETENCIES Operational Excellence: Strong focus on reliability, uptime, and process discipline. Technical Proficiency: Strong grasp of observability tools and IT stack components Service Delivery Awareness: Understanding of ITIL processes such as Incident, Problem, and Event Management Detail Orientation: High level of accuracy in monitoring, reporting, and configuration. Automation Capability: Ability to develop scripts and tools to enhance observability and operational efficiency Leadership: Ability to guide and support shift-based teams effectively. Problem Solving: Quick and effective troubleshooting of operational issues. Attention to Detail: Precision in managing access, assets, and documentation. Adaptability: Comfortable working in a dynamic, 24/7 environment. Compliance Awareness: Understanding of regulatory and audit requirements. Mobility & Flexibility: Ability to travel and adapt to different branch environments. Communication: Clear and effective communication skills ADDITIONAL SKILLS: Experience with DCIM (Data Center Infrastructure Management) tools. Experience with IBM iSeries platform Experience with cloud-native observability tools (e.g., AWS CloudWatch, Azure Monitor). Experience with REST APIs and data integration techniques Exposure to ITIL practices and service management framework Familiarity with cloud and hybrid infrastructure environments JOB PLACEMENT Data Centre Management, Infrastructure Operations, Digital Infrastructure Department JOB STATUS Permanent All applications are strictly CONFIDENTIAL and only shortlisted candidates will be called in for interview. Applications are deemed UNSUCCESSFUL if there is no feedback from the EPF 2 MONTHS after the closing date of the advertisement. Show more Show less