We are now looking for Compute/DL Architecture Performance Optimization Interns in our group! Are you passionate about exploring computer architectures for deep learning? Do you enjoy working at the intersection of hardware and software? NVIDIA is looking for world-class programmers and performance architects who like to continuously explore and mine the ultimate performance of each operator (or fusion operator) in deep learning networks, design and develop scalable modular infrastructure that can ship these highly optimized operators to different NVIDIA software libraries for training and inference.What you'll be doing:Analyze the performance of various machine learning/DL algorithms on existing/new architecturesIdentify bottlenecks and propose creative solutions to improve themDevelop high performance operators on NVIDIA GPUs for cuBLAS, TensorRT, cuDNN, cuSparse and cuTensor librariesDesign and develop software for shipping and testing the GPU operatorsBuild scalable automation for testing, integration, and release processes for publicly distributed deep learning librariesConfigure, maintain, and build upon deployments of industry-standard tools (e.g., Kubernetes, Jenkins, Docker, CMake, Gitlab, Jira, etc)What we need to see:Pursuing a B.S., M.S., or PhD degree in computer science (or similar)Strong programming skills in C/C++ developmentFamiliar with GPU programming model and CUDAGood understanding about AI compilation technologies and experience with MLIR, TVM developmentExcellent problem solving skills, good communication and teamworkNVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!
View Original Job Posting