NVIDIA is looking for outstanding software engineers to help us expand our enterprise GPU management and monitoring tools. In this role, you will work closely with the broader NVIDIA team to design and build Linux-based management agents, Kubernetes integrations, and end-to-end integration solutions that combine GPUs with the rest of the datacenter software management ecosystem. We are focused on supporting NVIDIA products across HPC, cloud, and enterprise on both bare metal and virtualized platforms as the role of GPUs in all of these environments expands. Your contributions will span many aspects of GPU system integration, including telemetry and metrics, health checks, diagnostics, configuration, and system management. These tools fill roles of both passive background monitoring and active online management with a core emphasis on operational transparency and seamless integration in customer environments. Your code will support single-node developer systems through large clusters with thousands of nodes.To succeed, you must have a strong Linux background, familiarity with modern distributed systems, and a proven work ethic. You will be expected to jump in quickly and provide valuable contributions from day one. This is a dynamic work environment with many exciting opportunities awaiting. NVIDIA GPUs are central to many hot enterprise, cloud, and datacenter trends. Come join us as we craft the future of accelerated computing and AI.What you'll be doing:Develop and maintain distributed, robust and scalable Go programs deployed to Kubernetes environments that manage large datacentersDevelop and maintain user-space applications, containers, Go-bindings, and CLI tools.Enable GPU management integration with the state-of-the-art open-source ecosystem, including Kubernetes and Docker.Support internal and external users through bug fixes, documentation, and feature improvements.Maintain high-quality products through robust test coverage.What we need to see:BS or higher in Computer Science or equivalent experience.5+ years of meaningful industry experience with a strong Go and Kubernetes development backgroundUser space development and debugging expertise in Linux environmentsBusiness level English.Experience with APIs and interface designOutstanding written and verbal interpersonal skillsStrong motivation and commitment to learn new skillsAbility to execute all aspects of the software development lifecycleAbility to manage time in a fast, heavily multitasked environmentWays to stand out from the crowd:Development experience with Rust, Python and/or C, C++. Development experience with distributed systems and concurrent applications, especially in a Kubernetes environmentExperience developing and maintaining enterprise software.. Experience deploying, managing, and debugging applications in a Kubernetes environmentBackground with containers (e.g. Docker, OCI), orchestration frameworks, and logging/telemetry backends with Kubernetes monitoring stacks with tools such as Prometheus, Loki and GrafanaExperience developing Kubernetes operators or Helm charts. Experience with HPC job schedulers like Slurm. Familiarity with Kubernetes internalsExposure to GPU programming with CUDA. Experience with Jenkins and GitHub/GitLab CI/CD pipelinesNVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you!The base salary range is 148,000 USD - 276,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
View Original Job Posting