Principal Engineer, Systems Software

Company: NVIDIA
Company: NVIDIA
Location: US, CA, Santa Clara
Commitment: Full time
Posted on: 2024-08-21 05:17
We are looking for a Principal Software Engineer with experience in building highly scalable and reliable software to join us. We are building a powerful operational automation platform for GPU clusters to improve their performance and utilization while reducing operational toil.What you’ll be doing:Architecting the product to discover cluster resources such as hosts, GPUs, and switches, and automate debug and repair actions on these resourcesDesigning the platform to support GPU clusters across different CSPs and platforms such as Kubernetes and SlurmDeveloping a distributed workflow execution runtime for parallel and fault tolerant actions on large number of resourcesOperating critical software services with high availability and reliability for customersInfluencing the product roadmap in collaboration with teams across various departments with the goal of reducing SRE toil and improving hardware utilizationOptimizing performance of system to increase scalability and improve user experienceLeading and delivering high impact projects with high quality, performance and stability with the lowest resource consumptionElevating the productivity and creativity of the technical staff by optimizing engineering practices, guiding junior engineers and providing quality design and code reviewsProgramming in systems languages like Go and RustWhat we need to see:Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field (or equivalent experience)15 years of equivalent experienceDemonstrated ability in building scalable and robust distributed systemsProven record of product rollouts and collaborating with early adoptersProficiency in programming in Go, Rust, C/C++, or JavaTechnical stewardship of projects across the organizationWays to stand out from the crowd:Deep understanding of concurrency and distributed systems conceptsExperience with handling large complex systemsExperience with SRE, DevOps, and platformsNVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for great people like you to help us accelerate the next wave of artificial intelligence.The base salary range is 272,000 USD - 419,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
View Original Job Posting