Sr. Site Reliability Engineer, Enterprise Systems

Company: Apple
Company: Apple
Location: Austin, Texas, United States
Department: Software and Services
Posted on: 2023-10-30 00:31
Summary Posted: Oct 9, 2023 Role Number: 200500591 Are you seeking an environment where you can drive innovation? Does the prospect of working with top engineering talent get you charged up? Apple is a place where extraordinary people gather to do their best work. Together we create products and experiences people once couldn’t have imagined — and now can’t imagine living without. Think platform-as-product! Our team delivers great developer experiences to our Program, Project and Development teams through curated set of tools, capabilities and processes offered through our Internal Developer Platform. We automate infrastructure operations, support complex service abstractions, build flexible workflows and curate a frictionless ecosystem that enables end-to-collaboration to help drive productivity and engineering velocity Key Qualifications Key Qualifications Experience building and leading Cloud Native SRE and Operational functions Experience supporting customer facing systems in an 24-7 uptime environment of distributed systems Ability to implement and coordinate telemetry using monitoring and observability tools Expertise handling production incidents, with experience working towards resolution and stakeholder communication during incidents. Automation focus for operational efficiency - crafting and implementing automation processes for repeatable and consistent service deployment A strong sense of ownership. Good critical thinking & interpersonal skills to work optimally across diverse business and technical & cross-functional teams. Solid understanding of on-prem and cloud based hybrid architectures and infrastructure concepts of zones, regions, VPCs etc. Understanding of common authentication schemes, certificates, secrets and protocols Scripting and/or coding skills needed for automation, triaging and troubleshooting Description Description As a Lead Site Reliability Engineer you will build up, lead and improve existing processes to provide 24x7 operational response for applications in public cloud platforms. Review go-live readiness through activities such as system design consulting, reviewing all observability and monitoring, capacity planning, and launch reviews. Understand processes to improve incident coordination among Apple teams. Keep up to date with the latest technologies and tools and evangelize their value with the development teams. You will partner with architects and engineers to design and implement automation, operations, and support solutions. Drive monitoring strategy across diverse workloads. Maintain services once they are live by setting up monitoring, alerting and measuring availability, latency, and overall system health. Strive for top quality results and continuously look for ways to improve and enhance platform reliability, performance, and security. Education & Experience Education & Experience Masters or Bachelor’s degree in Computer Science / Software Engineering / Related fields with a minimum of 5 years technical experience in relevant areas. Additional Requirements Additional Requirements
View Original Job Posting