Senior Site Reliability Engineer (SRE) – Azure Certified
Apply NowJob details
Senior Site Reliability Engineer (SRE) – Azure Certified Role Summary: We are looking for a Senior Site Reliability Engineer (SRE) with 8 years of experience in DevOps, cloud infrastructure, automation, and software development. The ideal candidate should be Azure-certified, proficient in GitHub Actions, Ansible scripting, and Infrastructure as Code (IaC), with a strong background in automation and development. This role requires deep expertise in building scalable, resilient, and automated cloud environments, ensuring high system reliability, and streamlining CI/CD processes. Key Responsibilities: Design, implement, and maintain highly available, scalable, and fault-tolerant systems. Architect, deploy, and optimize cloud infrastructure on Microsoft Azure, ensuring best practices for security, cost management, and performance. Build and enhance GitHub Actions workflows to support automated software delivery. Develop and maintain Ansible playbooks and Terraform scripts to manage cloud infrastructure and configuration. Design and implement automation solutions to reduce manual effort, improve incident response, and enhance system reliability. Proactively monitor, troubleshoot, and resolve complex production issues, ensuring minimal downtime. Implement and manage logging, monitoring, and alerting using tools like Prometheus, Grafana, Azure Monitor, or Datadog. Enforce security best practices across cloud and DevOps workflows, ensuring compliance with industry standards. Work closely with development, operations, and security teams while mentoring junior engineers on DevOps and SRE best practices. Required Qualifications: 7 years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering. Azure Certification (e.g., AZ-104, AZ-400) is mandatory. Strong experience with GitHub Actions for CI/CD automation . Expertise in Ansible scripting and Infrastructure as Code (IaC) . Proficiency in Python , Go, or Bash for automation and development. Deep understanding of containerization ( Docker, Kubernetes ) and microservices architectures. Experience with monitoring and observability tools such as Prometheus, Grafana, or Azure Monitor. Strong problem-solving and troubleshooting skills with a focus on automation and scalability. Preferred Qualifications: Experience with Terraform for cloud provisioning and configuration. Knowledge of API management and integrations. Experience with incident management and SLO/SLI definitions. Understanding of zero-downtime deployment strategies and blue-green deployments.
Apply Now