We are looking for a first-class DL Performance architect to drive the performance analysis and optimization of the state of art inference network on our GP: identify HW, SW performance limiters of DL networks, prototype the key primitives and guide the design of next generation architecture and DL software optimization.What you’ll be doing:Establish deep learning applications and use-cases for performance analysis, modelling, and projectionsAnalyzing and proposing both SW and HW optimizations for deep learning applicationsSpecify hardware/software configurations and metrics to analyze performance, power, accuracy and resiliency in existing and future uni-processor and multiprocessor configurationsCollaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, library, and compiler teamsWhat we need to see: MS or PhD in relevant discipline (CS, EE, Math) or equivalent experience with 2+ years of experienceTrack record of designing architectures to accelerate computational demanding algorithms and applicationsStrong background in computer architectureExpert mathematical foundation in machine learning and deep learningStrong programming skills in C, C++, Perl, or PythonWays to stand out from the crowd:Prior experience working on assembly level performance optimizationExperience working with deep learning frameworks like Caffe, TensorFlow and TorchFamiliarity with GPU computing (CUDA, OpenCL) and HPC (MPI, OpenMP)Background with systems-level performance modeling, profiling, and analysisExperience in characterizing and modeling system-level performance, executing comparison studies, and documenting and publishing results
View Original Job Posting