(Sr) Site Reliability Engineer - Observability and Data Platform

Company: Workday
Company: Workday
Location: USA, CA, Pleasanton
Commitment: Full Time
Posted on: 2023-05-03 16:58
Your work days are brighter here.At Workday, it all began with a conversation over breakfast. When our founders met at a sunny California diner, they came up with an idea to revolutionize the enterprise software market. And when we began to rise, one thing that really set us apart was our culture. A culture which was driven by our value of putting our people first. And ever since, the happiness, development, and contribution of every Workmate is central to who we are. Our Workmates believe a healthy employee-centric, collaborative culture is the essential mix of ingredients for success in business. That’s why we look after our people, communities and the planet while still being profitable. Feel encouraged to shine, however that manifests: you don’t need to hide who you are. You can feel the energy and the passion, it's what makes us unique. Inspired to make a brighter work day for all and transform with us to the next stage of our growth journey? Bring your brightest version of you and have a brighter work day here.About the TeamThe Data Platform and Observability team is based in Pleasanton, CA; Boston, MA and Dublin, Ireland. We enable real time insights across Workday’s - The Data Platform and Observability team is based in Pleasanton,CA; Boston,MA and Dublin, Ireland. We enable real time insights across Workday’s platforms, infrastructure and applications. Our focus is on the development of a large scale distributed data platform to support critical Workday applications. The team provides software for collection, ingestion, storage & visualization of critical data assets. We handle 100s of terabytes of data in the form of billions of messages produced daily by Workday applications and underlying services. If you enjoy writing efficient software or tuning and scaling large distributed systems you will enjoy working with us. Do you want to tackle exciting challenges at massive scale across private and public clouds for our 4000+ global customers? Do you want to work with world class engineers and facilitate the development of the next generation Observability & Data Platforms? If so, we should chat.About the RoleAs an SRE on the Data Platform & Observability engineering team, you will be responsible for the systems that observe/monitor all Workday platforms and services on private and public cloud are always up and running!You will own and drive the overall reliability & stability of all Observability and Data Systems, have the opportunity to work with product teams to define SLO/SLI’s and build appropriate dashboards and alerts that will help with triaging and solving. You'll be building tools that help with detecting, diagnosing and resolving issues with the platforms. This role will participate in on-call rotation.​About YouWe are open to hiring either a Senior or Mid-Level Site Reliability Engineer. See below for qualifications for both.Are you a hardworking, creative and driven team member who can support us in our mission to gracefully support our site reliability teams?If yes, We would love to hear from you! If you like trying new techniques and approaches to sophisticated problems, love to learn new technologies, are a natural collaborator and an excellent teammate who brings out the best in everyone around you, then give us a shout!Basic Qualifications - Site Reliability Engineer3-5+ years of professional experience3-5 years programming and scripting languages like Java, Go, Python2+ years utilizing Infrastructure as Code tools like Terraform and Automation/Config management tools such as Chef, Ansible.Experience in a public cloud environment (AWS/GCP)Kubernetes/container orchestration frameworks experience.Experience with Distributed Data systems (Elasticsearch, Hadoop, Kafka etc) & Query engines (Hive, Spark etc)Understanding of Time series data, monitoring systems, dashboarding with Grafana, Prometheus Bachelor’s Degree or higher, Computer Science/Engineering or equivalentBasic Qualifications - Senior Site Reliability Engineer5-7+ years of professional experience5-7+ years programming and scripting languages like Java, Go, Python5+ years utilizing Infrastructure as Code tools like Terraform and Automation/Config management tools such as Chef, Ansible.Experience in a public cloud environment (AWS/GCP)Kubernetes/container orchestration frameworks experience.Experience with Distributed Data systems (Elasticsearch, Hadoop, Kafka etc) & Query engines (Hive, Spark etc)Understanding of Time series data, monitoring systems, dashboarding with Grafana, Prometheus Bachelor’s Degree or higher, Computer Science/Engineering or equivalentOther QualificationsSystems and networking experience to help with DebuggingStrong organizational skills facilitating complex software development, cross team coordination, managing dependencies and helping with tough prioritization decisions.Excellent communication skills both written and verbal#LI-MH5As a federal contractor, Workday is requiring all new hires to verify that they are fully-vaccinated against COVID-19 within 72 hours of beginning employment with Workday, consistent with applicable law. Workday is an equal opportunity employer. Candidates who are not vaccinated due to a sincerely held religious belief, medical reasons, or other legally-protected reason should contact accommodations@workday.com to explore what, if any, reasonable accommodations or exemptions Workday is able to offer.Pursuant to applicable Fair Chance law, Workday will consider for employment qualified applicants with arrest and conviction records.Workday is an Equal Opportunity Employer including individuals with disabilities and protected veterans.Are you being referred to one of our roles? If so, ask your connection at Workday about our Employee Referral process!
View Original Job Posting