
Senior Site Reliability Engineer
- Kuala Lumpur
- Permanent
- Full-time
Specialization: IT OR COMPUTER NETWORK OR SYSTEM OR DATABASE ADMINJob description:Lead OpenShift Cluster Management:
Design, deploy, and maintain Red Hat OpenShift clusters on Azure with focus on scalability, availability, and security.Automate Everything:
Build and maintain IaC and automation workflows using Terraform, Ansible, ArgoCD for provisioning, upgrades, monitoring, and self-healing.Ensure Platform Resilience:
Collaborate with architects, cloud, security, and network teams to enforce best practices in disaster recovery, scaling, and compliance (PCI, SOX, SOC 2).Monitor & Respond:
Use tools like Prometheus, Grafana, and Dynatrace to track performance, respond to incidents, and lead root cause analysis and remediation.DevOps Leadership:
Maintain CI/CD pipelines, manage container security, support 24/7 operations, and mentor junior engineers to drive a culture of reliability and automation.