Staff Infrastructure SRE Engineer

Company: NVIDIA

Location: India, Bengaluru

Commitment: Full time

Posted on: 2023-10-28 18:35

For more than two decades, NVIDIA has pioneered visual computing, the art and science of computer graphics. With a rare focus on this field, we offer niche platforms for the gaming, professional visualization, data center, and automotive markets. Our work is at the center of the most consequential mega-trends in technology — virtual reality, artificial intelligence, and self-driving cars. We’re on a mission to automate and build brand-new infrastructure-as-a-service inside IT and we’re looking for senior engineers to help drive this effort. Because this position supports many users and systems in a worldwide production environment, your good judgment and attention to detail are needed. We strive to automate any repetitive tasks, empower our users with self-service tools, and provide clear documentation. A friendly, outgoing personality and strong written and verbal interpersonal skills are essential.What you’ll be doing:Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate monitoring and alerting, and to enable self-service consumption of resources.Write and review code, develop documentation and capacity plans.Reduce TOIL through automation.Work with cross functional business partners and customersShare an on-call rotation and be an escalation contact for service incidents.Own core infrastructure services (e.g., DNS, LDAP) and thousands of Linux servers across multiple data centers and regionsWhat we need to see:BS in Computer Science (or equivalent experience) with 8+ years of relevant experience, MS with 5+ years of experience.Extensive experience building and owning large-scale, multi-threaded, distributed backend systems.Proven experience in UNIX, TCP/IP network fundamentals.4+ years of software and API development experience (Python, Go)Shell scripting and automation of repetitive administration tasksProficient in managing highly available and scalable IT infrastructure, with knowledge on Docker/Virtualization, Monitoring, etc.Configuration management (Ansible, Puppet, Chef) experienceLog analysis and performance skillsTroubleshooting and problem-solving skillsWays to stand out from the crowd:Experience with Kubernetes cluster managementExperience with Cloud (AWS, Azure, Google Cloud)Good understanding of monitoring stack –Grafana/Prometheus.NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

View Original Job Posting