
Engineer, Central Observability Platform
- Kuala Lumpur
- Permanent
- Full-time
- Design and build an observability infrastructure for all engineering teams to consume.
- Develop and improve instrumentation for monitoring and logging the health and availability of services.
- Design and develop tools for metric collection, analysis, and reporting.
- Educate and lead efforts to improve observability among all engineering teams.
- Work with teams to enable an effective and pleasant on-call experience.
- Identify and collect the appropriate measurements, and synthesize the correct queries, to show intuitive and insightful visualizations which characterize the behavior of complex systems.
- Build a metrics pipeline with end-to-end latency under 5 minutes.
- Integrate logs with time series data for event correlation.
- Help us unlock the power of distributed tracing.
- Proactively monitor systems, networks, and applications to provide input in improving the stability, security, efficiency, and scalability of systems.
- 5 years experience working in Monitoring / Observability / SRE / DevOps / Performance.
- Experience working with cloud infrastructures, particularly Kubernetes and AWS.
- In-depth experience designing at-scale monitoring and logging for corporate infrastructure services.
- Expert-level experience in monitoring and logging technologies, both open source and from vendors
- Experience with ITRS Geneos, Zabbix, Prometheus, Grafana, ELK, and OpenTelemetry
- Awareness and understanding of the TTO'25 business strategy and model appropriate to the role. Support and the enablement of the Central Monitoring & Observability strategy, goals and objectives by developing prioritized features aligned to the Catalyst and Tech Simplification programmes.
- The Monitoring & Observability Platform team is a global team ensuring the design, development, delivery & support of the bank's central monitoring and observability services for all TTO teams (technology domains).
- The ideal candidate will possess a deep understanding of in one or more of the platform technologies (ITRS Geneos, Zabbix, Elastic Observability and Grafana) and its other required capabilities, such as Kafka messaging, database management, enabling the design, development, implementation, and management of the central solution, integrating advanced technological tools and techniques, and overseeing large-scale enterprise-level implementations.
- As a Observability engineer, you will play a crucial role in ensuring the stability, reliability, and performance of our applications and platform, thereby enabling our organization to deliver exceptional services to our internal stakeholders by adhering to the Enterprise SDLC (eSDLC) framework and guidelines.
- Actively engaging in stakeholders' conversations, providing timely, clear and actionable feedback to deliver solution within timeline.
- The ability to interpret the Group's technical and security (ICS) control requirements and information to identify potential risks and key issues based on this information and put in place appropriate controls and measures to mitigate or minimize risk to the central monitoring & observability platform delivery.
- Awareness and understanding of the eSDLC framework, in which the TTO software delivery operates, and the requirements and expectations relevant to the role.
- Responsible for adhering to the effectiveness of the central monitoring and observability platform deliver governance, based on oversight and controls of the eSDLC framework.
- Display exemplary conduct and live by the Group's Values and Code of Conduct.
- Take personal responsibility for embedding the highest standards of ethics, including regulatory and business conduct, across Standard Chartered Bank. This includes understanding and ensuring compliance with, in letter and spirit, all applicable laws, regulations, guidelines and the Group Code of Conduct.
- Effectively and collaboratively identify, escalate, mitigate and resolve risk, conduct and compliance matters.
- TTO CIO Development teams
- TTO Product Owners
- TTO SRE / PSS
- ET Foundation Service Owners
- Participate in solution architecture / design consulting, platform management, and capacity planning activities
- Create sustainable solutions and services through automation and service uplifts within monitoring and observability disciplines
- Daily tasks include providing Level 2 / Level 3 support to delivered solutions. This means solving incidents and problems and applying changes according to the bank's defined processes.
- Education Degree
- Training Agile Delivery, DevOps, SRE
- Certifications Any Monitoring or Observability product certifications, such as ITRS Geneos, Zabbix, ElasticSearch and Grafana
- Languages English
- Application Delivery Process
- Software Engineering
- Software Product Technical Knowledge
- Software Change Request Management
- Technical Troubleshooting
- Do the right thing and are assertive, challenge one another, and live with integrity, while putting the client at the heart of what we do
- Never settle, continuously striving to improve and innovate, keeping things simple and learning from doing well, and not so well
- Are better together, we can be ourselves, be inclusive, see more good in others, and work collectively to build for the long term
- Core bank funding for retirement savings, medical and life insurance, with flexible and voluntary benefits available in some locations.
- Time-off including annual leave, parental/maternity (20 weeks), sabbatical (12 months maximum) and volunteering leave (3 days), along with minimum global standards for annual and public holiday, which is combined to 30 days minimum.
- Flexible working options based around home and office locations, with flexible working patterns.
- Proactive wellbeing support through Unmind, a market-leading digital wellbeing platform, development courses for resilience and other human skills, global Employee Assistance Programme, sick leave, mental health first-aiders and all sorts of self-help toolkits
- A continuous learning culture to support your growth, with opportunities to reskill and upskill and access to physical, virtual and digital learning.
- Being part of an inclusive and values driven organisation, one that embraces and celebrates our unique diversity, across our teams, business functions and geographies - everyone feels respected and can realise their full potential.