Site Reliability Engineer (SRE) - PD

Beyondsoft

  • Malaysia
  • Permanent
  • Full-time
  • 19 days ago
Job Description:We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to ensure the reliability, availability, and performance of our systems. You will work closely with development and operations teams to design scalable, resilient architectures, implement automation, and manage Kubernetes and cloud infrastructure. This role involves proactive monitoring, incident response, and driving continuous improvements in system reliability.Responsibilities1. System Reliability
  • Partner with development teams to integrate reliability into the software development lifecycle.
  • Design and implement highly available and fault-tolerant architectures for mission-critical applications.
2. Kubernetes Operations
  • Design, implement, manage, and optimize Kubernetes clusters for availability, scalability, and security.
  • Perform upgrades, patches, and security hardening for Kubernetes infrastructure.
3. Automation & Infrastructure as Code (IaC)
  • Automate application deployment, scaling, and infrastructure provisioning.
  • Implement CI/CD pipelines for deploying and updating Kubernetes applications.
  • Develop and maintain IaC scripts (e.g., Terraform, Ansible) for provisioning and managing cloud and container resources.
4. Cloud Integration
  • Utilize AWS, GCP, or Azure services for Kubernetes deployments and integrations.
  • Apply cloud-native best practices for scalability and performance.
5. Monitoring & Alerting
  • Implement monitoring, logging, and alerting solutions (Prometheus, Grafana, ELK, etc.).
  • Proactively identify and resolve performance bottlenecks and reliability issues.
6. Incident Response
  • Respond to and resolve production incidents with minimal downtime.
  • Conduct post-incident analysis and implement preventive measures.
7. Capacity Planning
  • Perform capacity planning to ensure the Kubernetes infrastructure can accommodate current and future workloads in the cloud.
8. Security
  • Collaborate with the security team to implement and enforce Kubernetes and cloud security best practices.
  • Perform regular vulnerability assessments and compliance checks.
9. Collaboration & Documentation
  • Work cross-functionally with DevOps, security, and development teams.
  • Maintain comprehensive documentation for processes and configurations.
Qualifications:
  • Bachelor's degree in Computer Science, Information Technology, or a related field.
  • Minimum 3 years of proven experience as a Site Reliability Engineer or similar functional role.
  • Strong programming or scripting skills, with proficiency in languages such as Bash, Python, Go, or Java.
  • Extensive experience with Kubernetes orchestration, including cluster setup, management, and troubleshooting.
  • Experience with infrastructure-as-code tools (e.g., Terraform, Ansible) and cloud platforms.
  • Solid understanding of virtualization and networking concepts and principles.
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration skills.
  • Knowledge of cloud security best practices.
  • Familiarity with microservices frameworks.
  • Advantage: Certified Kubernetes Administrator (CKA) or equivalent certification.
Beyondsoft (Malaysia) Sdn. Bhd. is committed to being an equal opportunity employer and provides equal employment opportunities to all employees and applicants. We strive to cultivate a workplace that celebrates diversity and inclusion, where individuals of all backgrounds—regardless of nationality, ethnicity, religion, age, gender identity, sexual orientation, or any other distinguishing trait—can succeed and thrive. We prohibit discrimination and harassment of any type with regard to race, color, religion, age, national origin, disability status, genetics, sexual orientation, gender identity, or expression. This policy applies to all terms and conditions of employment, including recruiting, hiring, and the entire employee lifecycle. We are focused on creating an environment where everyone can reach their full potential.Employment offers from Beyondsoft (Malaysia) Sdn. Bhd. are contingent upon the successful completion of any required pre-employment processes, in line with applicable laws and regulations. Beyondsoft (Malaysia) Sdn. Bhd. does not ask for any recruitment fees, nor does it request any unauthorized payments from candidates as part of the hiring process.About Us:Beyondsoft (listed by the Shenzhen Stock Exchange, stock code 002649) is a global provider of IT consulting, product and solution services. Relying on strong R&D and innovation capabilities, the company widely adopts emerging technologies based on big data and mobile internet, including big data management platform, enterprise risk warning and public opinion monitoring system, AI-based intelligent operation and maintenance service, and intelligent automated test products. And a wide range of products and solutions, including internationally authoritative software testing qualification training, for a wide range of services in the fields of high technology, internet, finance, retail, logistics, energy, manufacturing, and medical.For more information, please visit www.beyondsoft.com

Beyondsoft

Similar Jobs

  • Senior Site Reliability Engineer

    • Malaysia
    Job Description Location: Kuala Lumpur About AirAsia MOVE AirAsia MOVE is a leading ASEAN-focused budget travel OTA, part of the Capital A Group. We deliver customer-centric travel…
    • 10 days ago
  • Senior Site Reliability Engineer

    AirAsia

    • Kuala Lumpur
    Job Description Location: Kuala Lumpur About AirAsia MOVE AirAsia MOVE is a leading ASEAN-focused budget travel OTA, part of the Capital A Group. We deliver customer-centric t…
    • 11 days ago