Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.
If you'd like to build the world's best deep learning cloud, join us.
What You’ll Do
Design and implement scalable, secure, and highly available Kubernetes clusters to support our growing application portfolio
Bootstrap new on-prem and managed Kubernetes environments from the ground up, including networking, storage, and security configurations
Extend our existing Kubernetes platforms with advanced features such as service mesh, serverless frameworks, and custom resource definitions (CRDs)
Develop and maintain infrastructure-as-code (IaC) templates using Cluster API (CAPI) for automated cluster provisioning and configuration management
Implement robust monitoring, logging, and alerting solutions using OpenTelemetry to ensure platform health and performance
Optimize resource utilization and cost-effectiveness of Kubernetes deployments across multiple cloud providers
Collaborate with teams to design and implement CI/CD pipelines for containerized applications
Troubleshoot complex issues in production Kubernetes environments and lead incident response efforts
Stay up-to-date with the latest Kubernetes ecosystem developments and evaluate new technologies for potential adoption
Mentor junior engineers and contribute to the development of platform engineering best practices
You
Have 5+ years bootstrapping, extending and operating K8s at scale (1,500+ nodes)
Have 5+ years automating the provisioning, configuration management, and deployment of production systems
Have 5+ years building resilient, scalable systems with Python/Go
Have 5+ years managing and securing infrastructure at scale (2,000+ hosts)
Possess Sound experience with Infrastructure as Code (Terraform, Ansible, etc.)
Possess Sound knowledge of DevOps, Infrastructure, and Platform concepts
Possess Strong development skills in Python or Golang
Possess Strong proficiency with Linux command line and debugging tools
Nice to Have
Experience with building complex hybrid environments (AWS and on-premise preferred)
Experience with service mesh technologies (e.g., Istio, Linkerd) and serverless frameworks (e.g., Knative)
Experience with multi-cluster or multi-cloud Kubernetes deployments
Experience in the machine learning or computer hardware industry
Certified Kubernetes Administrator (CKA) and/or Certified Kubernetes Application Developer (CKAD) certification
Contributions to open-source Kubernetes projects or tools
Familiarity with GitOps principles and tools like ArgoCD or Flux
Salary Range Information
Based on market data and other factors, the salary range for this position is $153,000-$240,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.
About Lambda
We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use
A Final Note:
You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.
Equal Opportunity Employer
Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
View Original Job Posting