Senior Deep Learning Data Engineer, Large Language Model

Company: NVIDIA

Location: India, Pune

Commitment: Full time

Posted on: 2023-09-08 06:00

Widely considered to be one of the technology world’s most desirable employers, NVIDIA is an industry leader with groundbreaking developments in High-Performance Computing, Artificial Intelligence and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, autonomous cars and conversational AI that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company, and build our teams with the smartest people in the world. Join us at the forefront of technological advancement.NVIDIA is looking for Data Engineers to develop high-impact, high-visibility Large Language Model product "NeMo LLM Cloud Service" & improve the experience of millions of customers. If you're creative & passionate about solving real world conversational AI problems, come join our NeMo LLM MLOps engineering team. For more details on NeMo LLM Service check https://www.nvidia.com/en-us/gpu-cloud/nemo-llm-service/What you’ll be doing:Perform data cleaning, formatting, segmentation, inference, and filtering for LLM development and evaluationCurate training dataset, analyze datasets, and publish reportAnalyze and evaluate vendor and field data inflowsUpdate data lake and run data ingestion and processing scriptsimprove processes for large language model data processing, augmentation, filtering, and training sets preparationCharacterize performance and quality metrics across platforms for various LLM MLOps componentsCollaborate with various teams on new product features and improvements of existing productsParticipate in developing and reviewing code, design documents, use case reviews, and test plan reviewsHelp innovate, identify problems, recommend solutions and perform triage in a collaborative team environmentWhat we need to see:5+ Years of experience with Master’s degree (or equivalent experience) or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, or Applied MathNative or near-native fluency in a non-English language - Spanish / Mandarin / German / Japanese / Russian / French / UK English / Arabic / Korean / Italian / PortugueseExcellent programming skills in PythonHands on experience with LLM data processing, augmentation, filtering and training sets preparationStrong fundamentals in Programming and Software designGood analytical skills.Know how of data platforms like Swiftstack and data catalogs, and MLOps platform such as Kubeflow, MLFlow, AirFlowExperience with MLOps workflows & traceability and versioning of datasetsUnderstanding of MLOPS life cycleKnow how of database management and queries (in SQL etc)General background around version control and code review tools like Git, Gerrit.Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environmentWays to stand out from the crowd:Strong C++ programming skills.Background with Dockers and KubernetesBackground with deploying machine learning models on data center, cloud, and embedded systemsExperience with “PyTorch” Deep Learning FrameworksHands-on data engineering experience on LLM TechnologiesNVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression , sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

View Original Job Posting