Responsibility Ensure reliability of responsible product(s) in terms of uptime, SLA, RTO/RPO, and other key measurements Develop/Support continuous cost analyses and improvement in cost of delivery Govern and standardize root cause analysis and prevention/mitigation follow-up adherence to a systematic failure investigation approach Develop and enforce SRE best practices, processes, and tools to enhance system reliability and performance Communicate with the software program manager, portfolio, and other key stakeholders for status update, reliability review, improvement feedback Develop and build competency of SRE engineers in the network and track their progress Educational Background: Bachelor's degree in computer science, Software Engineering, or a related field. Experience: 5+ years in software engineering, systems administration, or site reliability engineering. * 2+ years in a leadership or management role. Technical Skills: * Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack). * Strong understanding of virtualization, cloud computing, containerization, and orchestration technologies (e.g., VMWare. Azure, Kubernetes). Proficiency in programming and scripting languages (e.g., Python, Bash). Soft Skills: * Excellent problem-solving abilities and a proactive approach to challenges. * Strong communication and interpersonal skills, with the ability to work effectively in a team environment. Show more Show less