NVIDIA is looking for an outstanding DevOps SRE engineer to join its Software Infrastructure and Operations team. The position will be part of a fast-paced crew that develops and maintains sophisticated build environments for a multitude of platforms including Windows and Linux. NVIDIA is one of a kind company, crafting the future of computing and challenging the existing conventions. With your help we would forge the next generation of compute infrastructure combining the power of the CPU, GPU and DPU.What you’ll be doing:Support the scaling operation in our data centers.Deploy and Support end-to-end container management solution with Kubernetes, Docker and other innovative technologies.Setup and Handle end to end Jenkins instances - tools, plugins, nodes, user management, back up, restore, monitoring, etc.Craft and develop tools needed for automating maintenance of 10000+ hosts with only 10 support engineers.Use your depth in algorithms and system software background.Plan and Implement critical metric tracking using various analytics methods and dashboards.Reuse AI techniques to extract useful signals about machines and jobs from the data generated.Take part in prototyping, crafting and developing cloud infrastructure for NVIDIA.What we need to see: 5+ years of proven experience.Bachelor’s or Master's Degree in CS, Software Engineering, or related field, or equivalent experience.Proven programming background in Python, Java and/or relevant scripting languages.Experience in maintaining large scale cloud infrastructure applications.Excellent debugging and analytical skills.Experience in Databases both SQL (MySQL) and NoSQL (Elastic Search /MongoDB).Proficient with configuration management tools like Chef, Ansible, Puppet.Confirmed experience with Jenkins and/or other CI systems.Hands-on experience with VMs, Dockers, Kubernetes Cluster.Practice with analytics/visualization tools like Kibana, Grafana, Splunk etc.Experience with monitoring systems such as Zabbix and/or Nagios is nice to have.Ways to stand out from the crowd:Organized and capable of detailing processes and procedures.Outstanding teamwork skills across interpersonal boundaries.Experience with computer algorithms and ability to choose the best possible algorithms to meet the scaling challenge.Ability to divide sophisticated problems into simple sub problems and then reuse available solutions to implement most of those.Experience in design, implementation and deployment of major infrastructure features across multiple servers in incremental rollout modeWith competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most resourceful and hardworking people in the world working for us and, due to outstanding growth, our elite engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
View Original Job Posting