Engineering Manager, LLM MLOps Platform

Company: NVIDIA

Location: US, CA, Santa Clara

Commitment: Full time

Posted on: 2023-09-08 06:01

Join the hard-working team at NVIDIA as a hands-on Engineering Manager, where innovation and excellence are at the heart of everything we do. We're on a mission to redefine the way Machine Learning (ML) operates across industries, and we want you to lead the charge! In this pivotal role, you will orchestrate the development of Large Language Models (LLM) & Speech Deep Learning technology through our cohesive LLM MLOps platform. By crafting innovative ML pipelines and constructing a versatile platform that accommodates an array of unique use cases, you will be at the forefront of AI technology. NVIDIA's distinctive positioning in the market as a trailblazer in AI-driven solutions across diverse domains puts us in the driver's seat to engineer the future. Our ambitious vision is to shape a MLOps platform specifically tailored for LLM, NLP & speech applications. To realize this vision, we will collaborate closely with internal and external ML teams, focusing on a unified approach that identifies universal solutions and domain-specific necessities while developing pioneering MLOps tools and features. Joining our team means contributing to the MLOps strategy for NVIDIA, gaining deep insights into both MLOps production pipelines and infrastructure. Together, we'll strive to create what could possibly be the most sophisticated and comprehensive MLOps platform in the world. Your leadership and expertise can make this a reality!NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s an outstanding legacy of innovation that’s fueled by extraordinary technology—and outstanding people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world!What you'll be doing:Lead the development and implementation of MLOPs workflows & tools for large language model (LLM) and Speech projects. This includes Data Collection & Labeling, Data & Model workflows, and Model Evaluation tools.Lead a team of expert engineers and mentor new engineers in a sophisticated technology based environment. Provide both technical guidance and career mentorship.Design, code, and be responsible for the implementation of scalable and efficient MLOps infrastructure, tools, workflows for LLM & speech workloads.Leading code reviews, design reviews, the investigation & resolution of production issues, and retrospectives. Set standard development practices across your team.Develop and handle significant, achievable engineering plans aligned with key objectivesLead all Automation activities, CI and infrastructure of LLM-MLOPs teamWork with internal teams to improve and automate LLM data & models traceability, versioningBuild high performance data pipelines of Big Data solutions in real time for inferencing, training and ETL.Follow the best MLOps practices of automation, monitoring, scale and data privacyCollaborate across teams to build the entire MLOps process: data acquisition & ingestion, curation, labeling, preparation, training, evaluation and system validation workflows.What we need to see:A Bachelors or Masters in Computer Science, Computer Engineering, Electrical Engineering or a related field or equivalent experience10+ overall years of software development experience and 5+ years of management experienceExperience with hands-on coding and design in 3 of the following 7 technology/language areas:Backend technology: Python, NodeJS, Golang, other scripting languages (such as Shell)Frontend technology: JavaScript, React, CSSKnowledge of RESTful APIsWeb stack backend: Postgres, SQL, MongoDB, RedisDistributed Computing, Data Engineering and Data AnalyticsEnd-to-end MLOps platforms such as Kubeflow, MLFlow, AirFlowData pipelines/analysis/visualization tooling such as Elastic stack, Kibana, Kafka, Grafana, Splunk, Pandas, Message brokers, Data modeling etc.Ways to stand out from the crowd:You are highly motivated, passionate and curious about new technologies. You take pride in your work and strive to achieve incredible results.You demonstrate a behavior that builds trust: modesty, transparency, respect, intellectual integrity.Experience with Cloud Service Providers such as AWS, Azure, GCP, and OCIExperience with LLM or Speech in the MLOps spaceFluency in a second language and/or a passion for languagesExcellent collaboration, planning, and verbal and written communication skills.With highly competitive salaries and a comprehensive benefits package, Nvidia is widely considered to be one of the technology industry's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working with us and our engineering teams are growing fast in some of the hottest pioneering fields: Deep Learning, Artificial Intelligence, and Large Language Models.The base salary range is $176,000 - $333,500. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

View Original Job Posting