Senior / Lead Site Reliability Engineer, Alibaba Cloud

Company: Salesforce
Company: Salesforce
Location: Singapore - Singapore
Commitment: Full time
Posted on: 2023-05-03 16:36
To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.Job CategoryProducts and TechnologyJob DetailsAbout the jobSalesforce is partnering with Alibaba Cloud to offer its core products in China, this is to help support existing multinational customers that want to retain and expand their presence in the GCR and need Salesforce to run in region to meet China regulatory guidance.The Salesforce on Alibaba Cloud Site Reliability Engineering (SRE) team is a brand new Organization within Security Engineering, with an exciting mission to bootstrap adoption of the industry’s leading-edge SRE principles and best practices at Salesforce. We are looking for experienced Software Engineers/DevOps Engineers to join this new team. Working closely with counterparts in the Infrastructure and Engineering organizations, this Salesforce on Alibaba Cloud SRE group owns the reliable delivery of service to Salesforce engineering teams and customers running on Alibaba cloud infrastructure. This organization provides round-the-clock, follow-the-sun situational awareness and leadership in the swift resolution of any service-impacting issues, driving customer success.As a member of the team, you will be responsible for detecting and resolving system failures and complex outages, including creation of the observability tooling necessary for your success. This objective is met by monitoring the services, reacting to problems, proactively addressing issues before they affect performance or availability, and working with Engineering teams to define service level objectives and improving service design and implementation to increase reliability through closed-loop feedback. Salesforce on Alibaba SRE balances proactive automation with reactive operations, and targets 50%+ time spent on improving service design for reliability, extending monitoring and operational automation, driving self-healing and resiliency initiatives and game day exercises. The incumbent in this role would demonstrate a strong focus on tactical operations, as well as large-scale production engineering and orchestration.Minimum qualifications:Experience will be evaluated based on alignment to the core competencies for the role (e.g. extracurricular leadership roles, military experience, volunteer work, etc.).5+ years infrastructure and applications systems engineering experience in enterprise-scale Internet services. Experience in analyzing and troubleshooting systems using logging, distributed tracing, stack traces, and debuggers5+ years experience configuring and managing any of the Public Clouds using CLI/SDKs and automation (Alibaba or AWS preferred)5+ years experience in at least one of the following languages: Java, Python, Go. Ability to pick up new languagesExperience in Unix/Linux environments with good understanding of operating systems internals (e.g., filesystems, system calls)Working knowledge of the TCP/IP stack, routing and load balancing technologiesWorking knowledge of design principles of monitoring and alerting systems Ability to operate in a high-pressure environment, troubleshoot complex issues quickly, and successfully handle multiple prioritiesSystematic problem-solving approach, coupled with a strong sense of ownership and drive Incident management - Act in key support roles during major incidents e.g. Sev0, Sev1. Also, participate in the technical review of the incident for problem managementPreferred qualifications:A good understanding and practice in large-scale distributed systemsExperience in designing and deploying high performance production services with extensive monitoring and logging practicesCI/CD automation experience, including understanding of key open source technologies like Jenkins, Spinnaker, and DockerExperience defining immutable infrastructure via Terraform/Cloud Formation or other approaches across large footprints and distributed teamsExperience with on-call rotation, leading incident response and no-blame postmortem analysis Ability to debug, optimize code, and automate routine tasksMandarin Speaking Preferred to work with customers from Greater China Region.Customer/Partner Facing Experience#LI-YAccommodationsIf you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.Posting StatementAt Salesforce we believe that the business of business is to improve the state of our world. Each of us has a responsibility to drive Equality in our communities and workplaces. We are committed to creating a workforce that reflects society through inclusive programs and initiatives such as equal pay, employee resource groups, inclusive benefits, and more. Learn more about Equality at Salesforce and explore our benefits.Salesforce, Inc. and Salesforce.org are Equal Employment Opportunity and Affirmative Action Employers. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Salesforce, Inc. and Salesforce.org do not accept unsolicited headhunter and agency resumes. Salesforce, Inc. and Salesforce.org will not pay any third-party agency or company that does not have a signed agreement with Salesforce, Inc. or Salesforce.org.Salesforce welcomes all.
View Original Job Posting