Senior DevOps Engineer, AI Services

Company: NVIDIA
Company: NVIDIA
Location: US, CA, Santa Clara
Commitment: Full time
Posted on: 2023-05-03 15:38
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.NVIDIA is hiring an excellent DevOps Engineer to work on our NeMo LLM Service and Riva teams. Our teams create building blocks to make Speech AI easy to develop, integrate, and deploy. Your role is multifaceted: streamlining development, build, and releases with modern DevOps tools as well as maintaining cloud deployment infrastructure for our hosted services.What you'll be doing:Automating and optimizing build, test, integrate, and release processes for optimized Riva skillsConfiguring, maintaining, and building upon deployments of industry-standard tools (e.g. Gitlab, Docker, Bazel, Jira)Designing cloud deployment strategy for Riva and NeMo LLM services: helm charts, k8s operators, etcMaintaining multiple deployment environments: e.g. development, staging, and productionLead best-practices for building, testing, and releasing softwareIdentifying infrastructure needs and translating them into actionWhat we need to see:BS or higher degree in computer science (or equivalent experience)6+ years of relevant experienceStrong experience setting up, maintaining, and automating continuous integration systemsFluency in SCM (e.g. Perforce, Git) and build systems (e.g. Make, CMake, Bazel)Experience architecting or developing distributed systemsAdept programming skills in Python, Golang (or similar)Pragmatic approach to solving problems and collaborationReal passion for “it just works” automation and enabling team membersWays to stand out from the crowd:Experience with ElasticSearch, Grafana/Kibana, Logstash, fluentd (aka ELK stack)Deep knowledge of container and cluster technologies like Docker, slurm, kubernetes, and zabbixExperience with GPU computing systemsTrack record of identifying useful new technologies and incorporating them into SW development flowsExperience as an active contributor to a SW project involving many developersWith competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.The base salary range is $176,000 - $333,500. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.#deeplearning
View Original Job Posting