Site Reliability Engineer (Observability) London- Hybrid/ 3 Days Contract Inside IR35- 6 Months initially We’re looking for a Site Reliability Engineer (SRE) to join our client to build and maintain observability systems and to ensure their core services remain reliable, scalable, and high-performing. Responsibilities: Deploy and manage observability tools using a Prometheus like metrics store and Grafana Enterprise . Automate monitoring, alerting, and incident response. Build Grafana dashboards for system insights. Apply Infrastructure as Code (IaC) principles. Develop tooling in Golang (preferred) or Python . Advocate for SRE principles like SLOs, SLIs, and error budgets. Integrate monitoring with incident management workflows. Requirements: SRE principles and reliability engineering expertise. Solid familiarity with Linux Strong experience in deploying and building containers using Podman or Docker Golang (preferred) or Python for automation and API integration. Experience with Grafana, VictoriaMetrics, and PromQL Experience with centralized logs solutions deployment and management Strong Infrastructure as Code (IaC) knowledge. Nice to Have: OpenTelemetry experience. Terraform, Ansible, or CI/CD knowledge. Background in datacentre and compute hardware services . AWS infrastructure configuration and deployment Familiarity with Kubernetes and cloud-native systems. Incident response automation expertise.