Senior System Reliability Engineer

Company: NVIDIA

Location: Taiwan, Hsinchu

Commitment: Full time

Posted on: 2023-09-08 05:59

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing — with the GPU acting as the brains of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company and build our teams with the most thoughtful people in the world. Join us at the forefront of technological advancement.GPU Servers are one of the fastest-growing segments for NVIDIA and the Artificial Intelligence industry. As the computational power increases with every GPU generation, developing efficient and reliable systems is an imperative. We are looking for a System Reliability Engineer to join NVIDIA's existing Reliability Engineering team, involved in NVIDIA's diverse system product range specifically Graphics and High-Performance Computing printed circuit boards and Data Center Servers.What you will be doing:This position is not for silicon or chip reliability, but for printed circuit board assemblies (PCBAs) and Server products, ranging from Graphics Cards to HGX/DGX AI Servers.Locating in Taiwan and reporting to U.S.Work closely with CM/ODM. You will have the opportunity to interface and interact with all pertinent engineering groups and suppliers ensuring the desired reliability is achieved using Design for Reliability (DfR) approaches including FMEA and DoE approaches.Establish, deliver and maintain product reliability standards and metrics for NVIDIA's new system technologies, using existing tools and processes or developing new as required.Provide reliability predictions along with test plan definition and methods to assess and drive product reliability to the desired levels.Perform and lead appropriate testing with associated failure analysis and recommendations for improving designs and manufacturing.Develop and present methods of correlating reliability test results with actual field performance.What we need to see:BS/MS in EE/ME/Computer Engineering, or equivalent experience (graduate degree preferred). 10+ years in a hardware validation/reliability environment related to printed circuit boards and servers.Hands-on experience with Reliability demonstration & testing along with accelerated life methods such as Thermal Cycling, Shock & Vibration, ALT/HALT/HASS, Burn-in, and ORT for components, subassemblies, and complete products.Understand power supply, memory, high speed I/O, PCI express, Ethernet and I2C.Strong command and understanding of statistical concepts/models/analysis and how they relate to product reliability & life analysis.Fluent in Chinese and English. Good verbal and writing skills as well as the ability to communicate at a high level.Ways to stand out from the crowd:Self-motivating, independent, and committed to getting things done.Good project management skills and ability to balance multiple simultaneous projects during development and production stages.Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family at www.nvidiabenefits.com/.

View Original Job Posting