"> ">

Site Reliability Engineer

    Top Employer Germany 2026Top Employer Europe 2026Fair Company 2026
Professionals

Key Facts

  • Professionals
    Professionals
  • Information Technology
    Information Technology
  • Full time
    Full time
  • Pune, Maharashtra, India
    Pune, Maharashtra, India

Job Description

Role Overview 

We are seeking a Senior Site Reliability Engineer with strong experience in building and maintaining scalable, resilient systems. The ideal candidate will have hands-on expertise in cloud-native technologies, infrastructure as code, observability, and automation, with a focus on Google Cloud Platform (GCP). 
 

Key Responsibilities 

  • Ensure the stability and reliability of cloud-native applications deployed on GCP, containerized with Docker and orchestrated via Kubernetes. 

  • Define, implement, and monitor SLOs, SLAs, and SLIs to measure system performance and user experience. 

  • Automate infrastructure provisioning using Terraform and manage Kubernetes configurations with Kustomize and Helm. 

  • Develop and maintain monitoring and alerting systems using Datadog and GCP-native tools. 

  • Conduct incident analysis and postmortems to drive continuous improvement. 

  • Collaborate with development teams to integrate reliability practices into CI/CD pipelines using GitHub Actions. 

  • Manage and troubleshoot database systems, particularly PostgreSQL and Cassandra. 

  • Apply networking knowledge and Linux system administration skills to troubleshoot and optimize system connectivity and performance. 

 

 

Qualifications

Education 

Bachelor’s or Master’s degree in Computer Science, Software Engineering, or equivalent practical experience. 

Work Experience & Skills 

  • 10yrs to 12yrs years of experience in Site Reliability Engineering. 

  • Proven experience designing and operating elastic, resilient systems in cloud environments. 

  • Strong understanding of GCP, Kubernetes, and container orchestration. 

  • Proficiency in infrastructure as code and configuration management tools (Terraform, Helm, Kustomize). 

  • Experience with monitoring and observability tools (Datadog, GCP Monitoring). 

  • Solid scripting skills in bash and familiarity with automation frameworks. 

  • Experience with CI/CD pipelines, especially using GitHub Actions. 

  • Familiarity with networking fundamentals and troubleshooting. 

  • Strong coding skills and ability to develop reliability-focused tooling. 

  • Excellent communication skills in English (written and spoken). 

 

Other Requirements 

  • Strong problem-solving skills and a process-oriented mindset. 

  • Ability to work independently and collaboratively in a fast-paced environment. 

  • Passion for clean code, automation, and continuous improvement. 

 

Nice-to-Have 

  • Familiarity with monitoring tools (e.g., DataDog, Prometheus, GCP Monitoring). 

  • Experience working in Agile/Scrum teams. 

Benefits

Pension schemes

Compensation & Recognition of Contribution

Working hours

Work Flexibility & Support Life Balance

Work-life balance

Health & Wellbeing

Career planning

Global Exposure & Cross‑Border Collaboration

Learning and development

Learning, Skills & Career Progression

Commute

Leadership & Talent Development

Employee discount

Innovation, Ideas & Recognition

Family friendly

Culture, Community & Inclusion

Family friendly

Engagement & Shared Experiences

Contact

METRO
People & Culture
METRO Global Solution Center India

More opportunities you might like