Search JobsJob DescriptionThe Elevator Pitch: Why will you enjoy this new opportunity? Behind a lot of the world-class systems built by VMware is the telemetry collected by our products that needs to be ingested, stored, and analysed in a reliable, easy-to-use manner. On the SuperCollider SRE team, you’ll have the opportunity to handle the complex challenges of scale that come with the massive amounts of data that we process. As someone who enjoys tough but cool challenges, you’ll be encouraged to use your expertise in coding, infrastructure, complexity analysis and system design. A key to success in the role is intellectual curiosity, problem solving and openness. These are core values of our group and we aim to create an environment that provides support and mentorship needed to continuously learn and grow. About the Role: In a Nutshell Site Reliability Engineering is an exciting role that combines systems and software engineering to provide a “full-stack” challenge: you’ll run large-scale, distributed, fault-tolerant systems. SREs ensure our critical services, both their internal and external components, have reliability and uptime appropriate to our objectives. Additionally, while keeping an eye on our systems performance, SREs will be part of the larger SuperCollider R&D team and collaborate with our engineers to create more resilient systems.
Success in the Role: What are the performance goals over the first 6-12 months you will work toward completing? As a Senior SRE, you will be one of our deep experts in a new geography for the group. You will become familiar with the ins and outs of our data flows and augment them by adding incremental services, tools and procedures to increase their availability, performance and serviceability. What type of work will you be doing? What assignments, requirements, or skills will you be performing on a regular basis? Like traditional operations groups, we keep critical systems up and running despite hurricanes, bandwidth outages, and configuration problems. Unlike traditional operations groups, we also have full access and authority to fix, extend, and scale the code to keep it working and harden it against hazards. We are looking for talent with both systems and software background; strong candidates will have experience with both. You will be expected to: Create software and automation to improve the availability, scalability, performance, and efficiency of our services Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions Understand third-party products and components in data management and analytics Design and implement tools and a framework for software updates and upgrades and develop or leverage automation tools for continuous integration/continuous development and continuous troubleshooting Monitor resource allocation, consumption, and performance. Engage in service capacity planning and demand forecasting, software performance analysis and system tuning Continue improving your expertise and growing the skillset of your team To be successful, you probably will have demonstrated some or all of the following: Excellent troubleshooting skills across many layers (storage, networking, hypervisor, OS) Experience building and operating highly available and scalable infrastructure solutions: you’d probably have worked with Kafka, Zookeeper or similar tech Experience with infrastructure-as-code tools: Terraform, Packer and their kin Hands-on experience with configuration management tools such as Puppet, Chef, Ansible Containerisation and orchestration systems: Docker, Kubernetes, Helm, and the like Experience with at least one of the following languages: Go, Python, Bash (others may be OK as well) Experience with DB administration and SQL (even better if it is Impala DB) Experience with source code management (e.g., Git, Perforce) systems What is the leadership like for this role? What is the structure and culture of the team like? The SuperCollider team is a critical service organisation part of the Office of the CTO and is distributed in several of the geographies VMware operates in. You will be part of our SRE and infrastructure team that provides follow-the-sun monitoring of our systems and will collaborate with the teams in Bulgaria and US, reporting to a Director of Engineering. Where is this role located? This role is available in Costa Rica. What are the benefits and perks of working at VMware? TBD VMware is an Equal Opportunity Employer and Prohibits Discrimination and Harassment of Any Kind: VMware is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at VMware are based on business needs, job requirements and individual qualifications, without regard to race, color, religion or belief, national, social or ethnic origin, sex (including pregnancy), age, physical, mental or sensory disability, HIV Status, sexual orientation, gender identity and/or expression, marital, civil union or domestic partnership status, past or present military service, family medical history or genetic information, family or parental status, or any other status protected by the laws or regulations in the locations where we operate. VMware will not tolerate discrimination or harassment based on any of these characteristics. VMware encourages applicants of all ages. VMware will provide reasonable accommodation to employees who have protected disabilities consistent with local law. Search Jobs
View Original Job Posting