Site Reliability Engineer

Company: Alteryx

Location: Bangalore, India

Commitment: Full time

Posted on: 2024-05-17 05:25

We’re looking for problem solvers, innovators, and dreamers who are searching for anything but business as usual. Like us, you’re a high performer who’s an expert at your craft, constantly challenging the status quo. You value inclusivity and want to join a culture that empowers you to show up as your authentic self. You know that success hinges on commitment, that our differences make us stronger, and that the finish line is always sweeter when the whole team crosses together.Site Reliability Engineering is a new discipline at Alteryx where the team deploys, maintains, and operates Alertyx’s Cloud SaaS Products. As a Site Reliability Engineer - Observability, you will play a critical role in deploying, maintaining, and optimizing our cloud services while driving observability best practices. The team works with Product Engineering, Infrastructure Engineering, SRE and the Customer service teams to ensure SaaS services are available and performant. This team will originate customer software/service fixes, contributing to various code bases, and ensuring product availability, scalability, and resiliency. In addition, team members will automate responses to alerts and program alert remediation.What you’ll do:Implement and maintain observability tools and practices to monitor the performance and health of our cloud services.Collaborate with cross-functional teams to establish and enhance non-functional requirements related to service resiliency, security, and availabilityConfigure and automate alerts and remediation workflows using observability tools such as DataDog, Prometheus, Grafana, Kibana, etc.Act as an escalation point for diagnosing and resolving performance and availability issues in our cloud infrastructure and applications.Collaborate with development teams to improve code quality, performance, and reliability through observability insights.Participate in on-call rotations to provide 24/7 support for critical incidents and system emergencies.About you:3+ years experience as a Site Reliability Engineer or similar role, with a strong focus on observability and monitoring.2-4 years hands-on experience with modern observability tools and technologies (e.g., DataDog, Prometheus, Grafana, ELK stack).2-4 years experience designing, programming and/or operating distributed systems software2-4 years experience programming in python, go, javascript, java, .NET or another modern programming language1-2 years of experience with Kubernetes, OpenShift, k3s or another container orchestration technologyExperience troubleshooting and problem solving skills related to containers or distributed systemsExperience with CI/CD technologies like ArgoCD, Jenkins, or another CDExperience with AWS, GCP, Azure a plusExperience debugging software issues and performing RCAsProficiency in Helm, Docker, and GitlabAbility to break down and discuss technical issues and solutions with non-technical team membersFind yourself checking a lot of these boxes but doubting whether you should apply? At Alteryx, we support a growth mindset for our associates through all stages of their careers. If you meet some of the requirements and you share our values, we encourage you to apply. As part of our ongoing commitment to a diverse, equitable, and inclusive workplace, we’re invested in building teams with a wide variety of backgrounds, identities, and experiences.

View Original Job Posting