Site Reliability Manager - Omniverse Cloud

Company: NVIDIA
Company: NVIDIA
Location: Taiwan, Taipei
Commitment: Full time
Posted on: 2023-11-06 05:01
We are seeking a highly motivated Site Reliability Manager to join our Omniverse Infrastructure organization! We develop hardware and software systems to power Omniverse Cloud. NVIDIA Omniverse™ Cloud is a platform-as-a-service (PaaS) that provides developers and enterprises a full-stack cloud environment to craft, develop, deploy, and run industrial Omniverse applications.Site Reliability Engineering (SRE) focuses on production health to prevent outages and it does so by defining and developing deep software engineering solutions and practices, which simplify the operating environment and make not only Omniverse Cloud reliable, but also make feature development faster and safer. As team manager you will have the opportunity to create a great team that effectively balances the demands of reliability with the need of a healthy and motivated team. Automation is your motto, and you seek continuous measuring and response to what is actionable and matters!What you’ll be doing:Lead the local Omniverse SRE team and build an effective team that is empowered to set standard methodologies, contribute to the global SRE team and have the knowledge and tools needed to succeed.Establish good practices of Incident Management, Post Mortem and quality metrics-based reporting and integrated with the distributed team needs. Own the end-to-end availability of our SaaS and PaaS products, contributing to automated tools, sophisticated detection and self-healing mechanisms.Lead by example, mentor the team and establish credibility through quality technical execution, lowering of toil and safe guarding the well being of the team on callWhat we need to see:Master's degree in Computer Science or a related field, or equivalent experience8+ years of overall experience in system design, complexity analysis, troubleshooting distributed systems, or equivalent experience.3+ years experience managing people, leading projects and working with partners across multiple global teamsDeep hands-on experience with Kubernetes based cloud environmentsWays To Stand out from the Crowd:Extensive experience with AzureExperience with Prometheus, Grafana, Azure MonitorBackground with PaaS, and SaaS offerings
View Original Job Posting