NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. We are looking for a highly motivated Hardware Product Engineer with experience in Product development, OEM/ODM/Supplier enablement, System Manufacturing & Validation, and deploying of GPU hardware and systems at scale on clusters in production. In this role, you will act as a SME with strong collaboration with Design Architects, Commodity Managers, and Customer Application teams to develop product specifications, validation, NPI manufacturing assembly and test, quality, and reliability requirements to support our GPU systems and clusters.The Architecture organization works with the most sophisticated computing hardware and software, that drives the latest breakthroughs in deep learning and machine learning in partnership with NVIDIA’s enterprise and datacenter customers. This role offers an excellent opportunity to build a successful career in the rapidly growing field of deep learning while enabling the world's most successful technology companies. Primary responsibilities will be to enable efficient design and manufacturing of systems, clusters, and quality metrics around NVIDIA products and technologies. Join us in this ground breaking endeavor!What You’ll Be Doing:Collaborate with platform architecture, design, validation, FW/SW, and manufacturing operations to build and deploy data center infrastructure hardware at scale.Help develop product requirements, system validation, debug strategies, hardware health KPIs for next-gen HPC hardware working with customers and internal partners.Lead the effort in identifying test coverage, tools for platform validation via engagement with commodity vendors, OEMs/ODMs, and internal ME, EE, Thermal, Power, FW/SW, Customers, and Hyperscale Data Center partners.Drive technical requirements and ensure the solution is flexible, scalable across HW/FW/SW stack from L10 (Systems) to L12 (Racks, Clusters) covering targeted workloads.Define system behavior and operations for the platform to ensure compatibility with datacenter software, serviceability, telemetry, and CSP customer expectations.Lead multi-functional teams in driving server, rack, and cluster quality across design, manufacturing, data center deployment from NPI to Sustaining.Complete NTI, risk assessment methodologies to support next-gen GPU, Specialized HW platforms requiring sophisticated power and liquid cooling solutions.Support bring-up of new suppliers, OEMs/ODMs in conjunction with manufacturing and test requirement definitions, assembly fixtures, and system integration qualification.Influence product roadmap discussions, enable deep dives into FA methodologies, system/component debug, and fault isolation strategies.Develop system assembly, rack and roll, product yield, quality metrics, and reliability/availability (shock and vibe, thermal cycling etc.) guidelines for compute, storage, and AI/ML hardware.What We Need To See:Master’s degree in Electrical Engineering, Mechanical Engineering, Computer Engineering (or equivalent experience) and 10+ years of relevant technical engineering experienceUnderstanding of Product Lifecycle Management, CM/ODM working models across high-volume manufacturing, testing associated with compute, storage, networking, and AI/ML hardware.Validated experience in identifying customer needs, driving technical reviews, platform design tradeoffs, risk assessment and working with multi-functional HW, FW, Datacenter teams.Deep technical knowledge of manufacturing assembly and test, system validation, product quality, and yield target guidelines.Prior experience working with datacenter design and deployment teams, in setting up hardware assembly, deployment, rack delivery, and cabling practices.Understanding of OCP/EIA infrastructure hardware design (EE, ME, Thermal, FW etc.), PCIe devices (GPUs, Accelerators), rack and power, cooling solutions.Knowledge of OS, Linux, and Networking and system debug concepts.Strong verbal and written communication skills.NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most brilliant and talented people in the world working for us and, due to unprecedented growth, our world-class engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you!The base salary range is $152,000 - $287,500. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
View Original Job Posting