
T&T Senior Consultant - Innovation & Cloud Development Centre (Site Reliability Engineer) - MY
- Kuala Lumpur
- Permanent
- Full-time
- Monitor and maintain system performance to ensure the stability and reliability of applications and infrastructure.
- Design and implement resilient system architectures that support high availability and scalability.
- Develop automation tools and scripts to enhance operational efficiency and reduce manual effort.
- Define, track, and analyze SLOs and SLIs to ensure reliability and performance meet business needs.
- Conduct thorough post-mortem analyses following incidents, driving continuous improvement through root cause identification and solution implementation.
- Collaborate with development and operations teams to establish best practices in system reliability and incident management.
- Troubleshoot and resolve issues related to database performance, network connectivity, and deployment failures, including diagnosing problems at the underlying platform level (e.g., Kubernetes, virtual machines).
- Ensure that issues are resolved within the stipulated Service Level Agreements (SLAs), maintaining high standards of service delivery.
- Identify and troubleshoot performance bottlenecks in applications and infrastructure, providing actionable recommendations for enhancements.
- Maintain detailed documentation of processes and incident responses to support knowledge sharing and compliance.
- Improve monitoring solutions to proactively identify and mitigate issues before they impact services.
- Assist in the deployment and configuration of new applications and services, ensuring adherence to best practices.
- Participate in on-call rotations and respond to critical incidents as they arise.
- Analyze system logs and metrics to identify trends and potential areas for improvement.
- Actively seek out developmental opportunities for growth, act as strong brand ambassadors for the firm as well as share their knowledge and experience with others.
- Respect the needs of their colleagues and build up cooperative relationships.
- Understand the goals of our internal and external stakeholder to set personal priorities as well as align their teams' work to achieve the objectives.
- Constantly challenge themselves, collaborate with others to deliver on tasks and take accountability for the results.
- Build productive relationships and communicate effectively in order to positively influence teams and other stakeholders.
- Offer insights based on a solid understanding of what makes Deloitte successful.
- Project integrity and confidence while motivating others through team collaboration as well as recognising individual strengths, differences, and contributions.
- Understand disruptive trends and promote potential opportunities for improvement.
- Able to communicate effectively in Mandarin as role will require communicating with Mandarin-speaking clients/stakeholder based in locations/geographies where Mandarin is primarily used.
- Proficiency in programming languages such as Python, Golang, Java, or similar, focusing on operational efficiency.
- Experience in Bash/Shell scripting or automation for system administration tasks.
- Demonstrated experience in system architecture and design, prioritizing reliability, and scalability.
- Strong understanding of SRE principles, including SLOs, SLIs, toil reduction, and incident post-mortems.
- Hands-on experience with cloud environments (e.g., AWS, Azure, Google Cloud) and their operational management.
- Strong expertise in Linux system administration.
- Proven experience in troubleshooting application support issues with a focus on performance and connectivity.
- Familiarity with networking concepts and effective troubleshooting techniques.
- Excellent problem-solving abilities and a proactive approach to operational challenges.
- Ability to work independently while effectively collaborating within a team environment.
- Open to a rotational shift schedule across different time slots, with reasonable schedules shared in advance.
- Familiarity with monitoring tools and performance optimization techniques.
- Familiarity with DevOps practices and frameworks, including CI/CD, infrastructure as code, and containerization.