Senior Staff Incident Manager

Company: SailPoint
Company: SailPoint
Location: Headquarters (Austin, Texas, USA)
Commitment: Full time
Posted on: 2023-11-07 06:09
At SailPoint, we do things differently. We understand that a fun-loving work environment can be highly motivating and productive. When smart people work on intriguing problems, and they enjoy coming to work each day, they accomplish great things together. With that philosophy, we have assembled the best identity team in the world that is passionate about the power of identity.As the fastest-growing, independent Identity Security provider, SailPoint helps hundreds of global organizations securely and effectively deliver and manage user access from any device to data and applications residing in the data center, on mobile devices, and in the cloud. The company’s innovative product portfolio offers customers an integrated set of core services including identity governance, provisioning, access management, data security, and contractor management; all delivered as SaaS services.SailPoint is forming a new Incident Management team. This newly formed team will focus on handling the lifecycle of incidents, incident analysis, and systemic risk identification. Teams at SailPoint follow the service ownership model, where they operate what they build. Things still occasionally go sideways, and incidents happen that impact the customer experience. This is where the Incident Management team will step in to engage with service owners to drive mitigation of active incidents. Once active incidents are mitigated, our Incident Management team will lead detailed post-incident analysis to identify clear remediation actions to prevent future incidents. Ideal candidates for this role may have backgrounds in either Incident Management, Site Reliability Engineering, Senior Technical Support (Level 3 or higher), Project Management, DevOps Engineering, or Software Engineering.Responsibilities:Lead resolution of critical incidents in a timely manner, using effective communication and problem-solving skills to minimize impact and risk.Develop and maintain incident response plans, including communication plans, escalation procedures, and crisis management protocols.Work closely with Engineering, Product and Customer facing leaders & teams to facilitate an Incident and Problem Management program.Participate in team on-call rotation to run point as an Incident Commander for any incident that arises during your shift.Oversee blameless post-mortem analysis of incidents, capturing actions items to prevent future issues.Review incident trends and analyze patterns for review by senior engineering leadership.Produce and conduct training exercises for engineering teams to learn incident management protocols.Requirements :8+ years experience in 24/7 production operations, preferably supporting a highly available environment for a SaaS or cloud service provider.Strong mastery of incident management best practices and systems.Experience with cloud infrastructure environments, preferably AWS.Experience with containerization technology, preferably Docker & Kubernetes.Experience with Java, .Net, Golang, or Node application development and troubleshooting.Strong understanding of system and networking concepts and troubleshooting techniques.Strong verbal and written communication skills - ability to set and enforce process and influence engineers who are not direct reports.Ability to make quick, confident decisions.Ability to remain calm and focused during a crisis.Ability to lead in high-stress situations.Experience managing/leading a team a plus.Education:Bachelor’s degree in Computer Science or other technical discipline, or equivalent experience.SailPoint is an equal opportunity employer and we welcome everyone to our team.  All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.
View Original Job Posting