Staff Software Engineer - Platform Engineering & SRE

Company: Equinix
Company: Equinix
Location: Toronto Office TRO
Commitment: Full time
Posted on: 2025-05-25 22:53
Who are we?Equinix is the world’s digital infrastructure company®, operating over 260 data centers across the globe. Digital leaders harness Equinix's trusted platform to bring together and interconnect foundational infrastructure at software speed. Equinix enables organizations to access all the right places, partners and possibilities to scale with agility, speed the launch of digital services, deliver world-class experiences and multiply their value, while supporting their sustainability goals.  Our culture is based on collaboration and the growth and development of our teams.  We hire hardworking people who thrive on solving challenging problems and give them opportunities to hone new skills and try new approaches, as we grow our product portfolio with new software and network architecture solutions. We embrace diversity in thought and contribution and are committed to providing an equitable work environment that is foundational to our core values as a company and is vital to our success. Job DescriptionWe are looking for a highly skilled and motivated Platform Engineering & SRE Staff Engineer to join our team. As a Platform Engineering SRE, you will play a critical role in developing, maintaining and improving the reliability, scalability, and performance of our systems, ensuring seamless user experiences. This position blends software engineering and systems engineering expertise to create automated solutions for operational challenges.Key ResponsibilitiesReliability and PerformanceEnsure the high availability, reliability, and performance of production systems and servicesImplement and maintain disaster recovery plans and proceduresMonitor and manage system health using metrics, logs, and tracing to proactively identify and resolve issuesAutomation and InfrastructureAutomate repetitive tasks, including deployment, scaling, monitoring, and remediation of systemsBuild and maintain infrastructure as code (IaC) using tools like Terraform, CloudFormation, or similarIncident ManagementParticipate in incident response and troubleshooting efforts to minimize downtime and resolve issues quicklyConduct root cause analysis for system failures and implement preventive measures to avoid future incidentsRespond to incidents, perform root cause analysis, and implement solutions to prevent recurrenceMaintain incident response playbooks and ensure efficient on-call rotationsObservability and MonitoringDesign and implement monitoring solutions using tools like Prometheus, Grafana, Datadog, or similarCollaborationWork closely with development, QA, and operations teams to ensure smooth delivery of applicationsAct as a bridge between software engineering and operations, advocating for DevOps best practicesDocument system configurations, processes, and procedures to ensure knowledge sharing and maintain system integrityCapacity and ScalabilityConduct capacity planning and optimize system scalability to meet future demandsImplement strategies for horizontal and vertical scaling of applicationsSecurity and ComplianceEnsure infrastructure security by implementing best practices and addressing vulnerabilitiesCollaborate with the security team to meet compliance standards and auditsData Engineering & AutomationDevelop and maintain scalable and efficient data pipelinesAutomate data workflows for ETL/ELT processes, integrating data from various sources into data warehouses and other storage solutionsDevelop and maintain solutions for data transformation, data modelling, and automate the orchestration of data processingData Warehouse ManagementImplement and maintain modern data warehouse architectures, ensuring effective data storage, retrieval, and accessibilityWork with cloud-based data warehouses (e.g., BigQuery, Snowflake, Redshift) and optimize data models for analytics and reportingDevelop and manage dimensional models, star/snowflake schemas, and data marts for operational and analytical use casesReal-time and Batch Data ProcessingBuild and manage real-time and batch data pipelines for high-volume data ingestion, processing, and analyticsLeverage technologies such as Apache Kafka, Apache Beam, Apache Spark, and Google Cloud Dataflow for streaming and batch processingQualificationsExperience5+ years of experience in a Data Platform including Site Reliability Engineering, DevOps, or Systems Engineering roleTechnical SkillsStrong programming skills in languages such as Python, Java, or similarExperience in developing Data ingestion pipelines, Governance, Quality and automationExperience in cloud platforms such as Google Cloud / AWS / AzureHands-on experience with CI/CD pipelines using tools like GitHub Actions, JenkinsExposer to containerization and orchestration technologies like Docker and KubernetesExperience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK Stack)MethodologiesKnowledge of Software Engineering, Data Modelling and SDLCUnderstanding of SRE principles, including SLIs, SLOs, and error budgetsKnowledge of incident management frameworks and root cause analysis techniquesSoft SkillsStrong analytical and problem-solving skillsExcellent communication and collaboration abilitiesPreferred QualificationsFamiliarity with configuration management tools (e.g., Ansible, Puppet, Chef)Background in performance testing and load testingEquinix is committed to ensuring that our employment process is open to all individuals, including those with a disability.  If you are a qualified candidate and need assistance or an accommodation, please let us know by completing this form. Equinix is an Equal Employment Opportunity and, in the U.S., an Affirmative Action employer.  All qualified applicants will receive consideration for employment without regard to unlawful consideration of race, color, religion, creed, national or ethnic origin, ancestry, place of birth, citizenship, sex, pregnancy / childbirth or related medical conditions, sexual orientation, gender identity or expression, marital or domestic partnership status, age, veteran or military status, physical or mental disability, medical condition, genetic information, political / organizational affiliation, status as a victim or family member of a victim of crime or abuse, or any other status protected by applicable law. 
View Original Job Posting