Deep Learning Performance Architect, Infrastructure

Company: NVIDIA
Company: NVIDIA
Location: China, Shanghai
Commitment: Full time
Posted on: 2023-09-08 05:56
We are now looking for an Infrastructure Software Engineer for Deep Learning Libraries! NVIDIA's Deep Learning Libraries Group is seeking excellent software engineers to enable the next wave of NVIDIA’s highest performing deep learning libraries. The mission is to design and develop scalable, modular infrastructure that streamlines development, build, and test across NVIDIA’s diverse set of platforms, from Drive AGX for autonomous vehicles to DGX servers for datacenter. Join our technically diverse team of software engineers and infrastructure experts to design the systems that enable NVIDIA to stay ahead of the competition as we deliver the world's fastest deep learning platforms.What you'll be doing:Designing and developing software for testing and analysis of our codebasesBuilding scalable automation for build, test, integration, and release processes for publicly distributed deep learning librariesDeveloping throughout the software stack, from the user experience down to the cluster and database layersConfiguring, maintaining, and building upon deployments of industry-standard tools (e.g. Kubernetes, Jenkins, Docker, CMake, Gitlab, Jira, etc)Advancing state of the art in those industry-standard tools and upstreaming contributions to the open source communityWhat we need to see:BS or equivalent experience or higher degree in Computer Science or Computer Engineering3+ years of relevant experience.Strong programming skills in Python (or similar) and familiarity with C/C++ developmentExperience setting up, maintaining, and automating continuous integration systemsFluency in SCM (e.g. Git, Perforce) and build systems (e.g. Make, CMake, Bazel)A pragmatic approach to solving problems and collaborationPassion for “it just works” automation and enabling team membersWays to stand out from the crowd:Experience designing and developing automation in Jenkins with Groovy (or similar)Background with distributed systems and cluster/cloud computing, especially with KubernetesExperience designing and developing unit and integration test frameworksHands-on experience with code coverage and static code analysis toolsKnowledge of GPU computing systems
View Original Job Posting