
Senior DevOps/SRE Software Engineer (CI/CD)
- Kuala Lumpur
- Permanent
- Full-time
- Contribute to system design and deployment phases with a focus on scalability, reliability, and operability. Ensure that production readiness is considered at every stage of the software lifecycle.
- Develop automation scripts, infrastructure as code, and tooling using industry best practices to improve system reliability, reduce manual effort, and enable self-service.
- Review system architectures, deployment strategies, observability setups, and operational documentation to ensure reliability and operational excellence.
- Analyze production issues, identify root causes, and implement long-term reliability improvements through automation, monitoring, and architectural enhancements.
- Work collaboratively with other team members and provide guidance to more junior team members.
- Organize an efficient handover through high quality documentation and training.
- Automate the deployment and operation of multi-tenant infrastructure, handling tasks that ensure system resilience and availability.
- Develop and maintain monitoring tools, dashboards, and self-healing mechanisms.
- Participate in on-call rotations, conduct blameless postmortems, and drive continuous learning.
- Work closely with developers, product teams, and engineering stakeholders to troubleshoot issues, improve systems, and integrate reliability improvements
- Minimum 6 years of experience in Site Reliability Engineering or software development within an international company.
- Hands-on experience with CI/CD and deployment tools such as Ansible, Jenkins, Maven, Nexus, Git, and Docker.
- Proficiency in Linux OS
- Proficiency in scripting and automation (e.g. Python, PowerShell, YAML) with the ability to develop tools and infrastructure as code.
- Familiarity with Java-based systems with the ability to understand code for root cause analysis.
- Understanding of distributed systems and microservices architectures, including REST and SOAP APIs.
- Experience with databases, including NoSQL platforms.
- Familiarity with performance and reliability testing tools such as JMeter or Postman.
- Exposure to observability and analytics technologies; experience with Elasticsearch or reporting tools like Power BI is a plus.
- Practical experience working in Agile-driven teams.
- Strong interpersonal and communication skills, with a customer-centric mindset and the ability to work effectively across cultures.
- Demonstrated ability to collaborate with distributed teams across multiple time zones.