Iris’s client, one of the Top 5 Bank in Canada is looking to hire a Database Reliability Engineer Lead for a long term contract opportunity. Our Client is a USA multinational financial services company and the largest bank in Canada by market capitalization. The bank serves over 17 million clients and has more than 89,000 employees worldwide. Bank is serving individual consumers, small and middle market businesses and large corporations with a full range of banking, investing, asset management and other financial and risk-management products and services. Position: Database Reliability Engineer Lead Location: Toronto, ON (Hybrid) Duration: Long Term Responsibilities: Expertise in AWS services (Redshift, RDS, Aurora). Hands-on experience with CDK automation and observability for data lakes/pipelines. Knowledge of Apache Airflow for pipeline management. Cloud Experience: In depth knowledge in Cloud Native tools / services: CDK, Cloud Watch, EKS, EC2, ELB, S3, Lambda, & SSM. SRE: In depth knowledge and experience in Observability, Toil Management, Monitoring tools (Dynatrace, CW, Azure Monitor), Resilient Arch, IaC, CaC, JSON, Typescript, API and Webhook development using Python, Node.js, Ruby, PowerShell, and Shell Scripting languages. In depth understanding of Dynatrace advanced features (DT Guardian, RUM, Synthetic testing and monitoring, AI event correlation) Automation: Leverage Ansible Tower, AWS SSM, BitBucket / GitHub to build automated workflow that eliminate Toil, improve response time and streamline deployment pipeline. Cloud Orchestration tools (AWS Step functions, Containers, Apache Airflow) with special focus on Data Batch Processing and Pipelines Deep knowledge in Data Management, Data Warehouse, Data lakes, & Database reliability (RedShift, RDS, Aroura) Experience in Logs ingestion (AWS Firehose, DT Open Pipeline), Reporting and Dashboard tools, Operational Metrics and analytics Exceptional Problem Solving skills, Knowledge Management and effective communicator that can speak the language of people, process and technology. Decisive, energetic, focused team player who builds and leads high-performing teams / CoP and foster a culture of diversity, inclusion, recognition and growth. Work in collaboration with Application Development, Quality, Product and Data Engineering teams to Champion SRE/ DevOps culture and practices. Strategic approach with clear objectives to improve service / product Availability, Performance Optimization, improve Incident MTTR, Change Success Rate and ensure feedback loop to Dev Build and maintain Reliable Systems and platforms using SRE and DevSecOps principles with special focus on Observability, Resiliency (proactive impact prevention), Self Healing and Reliability testing Work with App & Business teams to establish (SLO/SLI), SRE Dashboards that provide multiple views (LOB, business process or App) view to track value and enable effective decision making Innovative approach to Reliability, from Arch and feasibility phase to Operation & Continuous Improvement following product model and Agile methodologies. Focus on latest technology trends when it comes to Observability, Automation, Platform technology and tools including AIOps & MLOps reliability and resiliency. Ensure Toil is addressed from inception and addressed in Operations (self healing, self config, self Provision and optimization) by leveraging Sense & response, advanced monitoring (synthetic & RUM) Lead / Participate in Community of Practice (CoP) to connect and collaborate with like minded teams, set objectives, roadmaps, and implementation. SRE office hours and CoP leadership and participation. Thanks and Regards, Raghav Ranjan Iris Software Royal Bank Plaza – North Tower 200 Bay Str. Toronto, ON, M5J 2J2 Email: raghav.ranjanirissoftware.com www.irissoftware.com