NVIDIA is looking for a senior server software engineer who can develop cloud infrastructure software that will, with little effort, provision many hundreds of baremetal GPU and CPU servers in multiple datacenters around the globe. The goal is to craft a reliable, scalable, and efficient server infrastructure to support NVIDIA software development workflows and tools, including CI/CD pipelines, compute resource management flow and developer productivity tools. The server infrastructure is serving the needs across the whole software stack for NVIDIA from Graphics Drivers to Autonomous Vehicles to Deep Learning frameworks. To achieve this goal, we are looking for an engineer who has a deep understanding of Linux, outstanding infrastructure design skills, solid software development skills, and a track record in building and delivering large-scale server infrastructure.What You Will Be Doing:Design, develop, solve, improve, and debug software programs for enhancements to existing HPC hardware/software services, as well as new HPC products and offerings.Develop/deploy network bootable in-memory Linux-based OS that can be used in all environments for commissioning tests, such as hardware validation, cabling (lldp), firmware updates, DCIM updates, etc.Create PXE boot installers using debian-isntaller files, curtain or user-data, for OS deployment systems. An understanding of how to make cloud image files using tools like openstack-image-builder, Mkosi, dracut, etc.Start depreciation/replacement process of legacy server provisioning systems where applicable.Using code that you wrote or improved upon, deploy fleets of servers that run our ever-expanding cloud services.Lead the server automation such as OS provisioning, software deployment, configuration, and maintenance.Determine hardware and software compatibility requirements for supported services and influence both the hardware and software design.Automate the population and maintenance of inventory information into our DCIM system.Create, track and report system reliability metrics for all systems under management.Develop roadmap for servers, aligned to business requirements, performance, capabilities, and technology trends.Work with cloud infrastructure team to implement correct hardware health monitoring and remediation states in our data center automation system.What We Need to See:Minimum 8+ years of server software engineering experience supporting highly-available, large-scale, cloud service/ISP/Telco/or HPC environments.BS or Graduate degree in Computer Science, Software, or a related degree (or equivalent experience) with proven record of delivering productsStrong software development knowledge with Python.Strong Linux systems administration experience with Ansible.Knowledge of DPUs/smartnics.General understanding of IT infrastructure systems: server, network, storage, data center and colocation.Knowledge of industry data center standards, policies & methodologies.Good communication and soft skillset, able to present to senior management in a sensible and persuasive manner.Love to influence and establish relationships with other software and IT functional groups such as development, server, storage and security teams.Ways to Stand Out from the Crowd:You have architected, built, and deployed server infrastructure for a large scale (1000s of machines) in the past, used by 1000s of people.Familiar with server automation.Passionate about innovating and investing in groundbreaking technologies.NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for great people like you to help us accelerate the next wave of artificial intelligenceThe base salary range is $176,000 - $333,500. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
View Original Job Posting