Senior Site Reliability Engineer (SRE), Multi-cloud Platform

Company: Workday
Company: Workday
Location: New Zealand, Auckland
Commitment: Full Time
Posted on: 2024-01-19 05:11
Your work days are brighter here.At Workday, it all began with a conversation over breakfast. When our founders met at a sunny California diner, they came up with an idea to revolutionize the enterprise software market. And when we began to rise, one thing that really set us apart was our culture. A culture which was driven by our value of putting our people first. And ever since, the happiness, development, and contribution of every Workmate is central to who we are. Our Workmates believe a healthy employee-centric, collaborative culture is the essential mix of ingredients for success in business. That’s why we look after our people, communities and the planet while still being profitable. Feel encouraged to shine, however that manifests: you don’t need to hide who you are. You can feel the energy and the passion, it's what makes us unique. Inspired to make a brighter work day for all and transform with us to the next stage of our growth journey? Bring your brightest version of you and have a brighter work day here.About the TeamAre you a Senior Site Reliability Engineer with who loves the challenge of automating, operating and improving pioneering cloud native service platforms? Do you love digging into a production problem and seeing it through to resolution and follow through?We’re the team that deploys, operates and supports our cloud native technology platform that was designed from scratch for the cloud. We lead the reliability for the complete stack and tools that delivers and supports Workday products across public clouds (e.g. AWS, GCP, Azure).The platform is built using Cloud Native technologies (CNCF), on a foundation of Kubernetes in Public Cloud environments. This provides a secure platform on which Workday service teams, and Platform development teams can build and test their pre-release code, through deployment to production on a continuous basis.Engineers from this team have shared their experiences at Cloud Native conferences, including KubeCon.About the RoleThe primary function of the SRE team is to ensure the reliability and availability of the platform to meet the desired SLAs by investing in meantime to detect (MTTD), meantime to repair (MTTR) and meantime between failures (MTBF), as well as reducing operational load (toil) to scale sustainably in alignment with business growth.Be a key member of team of dedicated SREs responsible for software engineering and operations, with an emphasis on reducing operational toil.Automation and improvement is planned by following scrum practices with two week sprints.The scrum team is autonomous - on-call function is follow-the-sun (New Zealand, Ireland, US)Tech stack is Cloud Native (Kubernetes, Istio, OPA, GoLang, Prometheus, Grafana etc)Responsible for the safe change and reliability of customer environments, with SLO gated multi-stage deployment automation. Mission is to improve platform reliability, observability and overall customer satisfaction.Develop and launch effective SLIs to ensure that SLOs are achieved through building an extendable Observability architecture, runbook automation, and establishing new processes.Partner with platform service teams to craft and implement a range of SRE standards for their respective services to meet. Define benchmarks and automation to qualify services to move to production environments.About YouYour passion for identifying and solving problems on distributed environments scaling across configuration, Linux Operating System and network. You have hands-on experience handling distributed environments (Kubernetes experience is a big plus). You have a keen interest in improving operational efficiency, and believe that automation is the key to operating large-scale systems. You are driven to ensure customer success. Basic Qualifications:BS in Computer Science or related field or equivalent years of experience4+ years in handling and solving distributed systems in a public cloud3+ years of SRE experience in a distributed systems environment.Experience with AWS, GCP, or AzureStrong experience with KubernetesExperience with LinuxProficiency with a programming language such as GoLang, Python, or Ruby (preferably GoLang (Go))Experienced with software development standard methodologies such as code management, CI/CD, testingOther Qualifications:Passionate for automation, with a track record of referenceable examples.Can work independently and with the demeanor that everything can be automated.Skills to operate, maintain, support and sustain the platform.Energised by working in a fast-paced environment. Experience collaborating with multi-functional global and remote teams with a diverse set of backgrounds.Excellent documentation skills, experience with developing detailed runbooks, processesWe are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodationOur Approach to Flexible Work With Flex Work, we’re combining the best of both worlds: in-person time and remote. Our approach enables our teams to deepen connections, maintain a strong community, and do their best work. We know that flexibility can take shape in many ways, so rather than a number of required days in-office each week, we simply spend at least half (50%) of our time each quarter in the office or in the field with our customers, prospects, and partners (depending on role). This means you'll have the freedom to create a flexible schedule that caters to your business, team, and personal needs, while being intentional to make the most of time spent together. Those in our remote "home office" roles also have the opportunity to come together in our offices for important moments that matter.Are you being referred to one of our roles? If so, ask your connection at Workday about our Employee Referral process!
View Original Job Posting