This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Senior Manager of Site Reliability Engineering (SRE) at CVS Health, you will lead a team of SREs responsible for ensuring the reliability, availability, and performance of critical systems and services. This is a high-performing integration platform processing about 6 billion transactions every month. You will collaborate with cross-functional teams to design, implement, and maintain scalable and resilient infrastructure solutions that support business objectives. Your leadership will drive the adoption of best practices in site reliability, incident management, and continuous improvement.
Job Responsibility:
Lead and mentor a team of Site Reliability Engineers, fostering a culture of collaboration, innovation, and continuous learning
ensure the availability, reliability, and performance of critical services through proactive monitoring, capacity planning, and performance tuning
design, implement, and maintain observability solutions using tools such as AppDynamics, Splunk, Prometheus, Grafana, or Open Telemetry
collaborate with software engineering, operations, and product teams to design and deploy scalable and resilient systems
oversee incident management processes, ensuring timely resolution of incidents and minimizing downtime
establish and monitor key performance indicators (KPIs) to measure system reliability and performance
conduct post-incident reviews and implement lessons learned to prevent future occurrences
stay current with industry trends and emerging technologies to continuously improve SRE practices
manage budgets and resources effectively to support SRE initiatives and projects
incident management: lead incident response efforts, perform root cause analysis (RCA), and drive post-mortem processes to improve system reliability
automation & infrastructure as code (IaC): develop automation to reduce manual operational tasks using Terraform, Ansible, or Kubernetes
CI/CD & deployment pipelines: work closely with development teams to enhance deployment strategies and improve continuous integration/continuous deployment (CI/CD) workflows
cloud & Kubernetes operations: manage and optimize cloud infrastructure (AWS, Azure, or GCP) and container orchestration platforms (Kubernetes, Docker)
implement best practices for security, compliance, and cost optimization in cloud environments
Requirements:
7+ years of experience in site reliability engineering, DevOps, or a related field
5+ years of experience of cloud computing platforms (e.g., AWS, Azure, GCP) and container orchestration (e.g., Kubernetes, Docker)
3+ years of experience in a leadership or management role, with a proven track record of managing high-performing teams
3+ years of experience in scripting and programming languages (e.g., Python, Go, Java)
3+ years of experience in monitoring and observability tools (e.g., Prometheus, Grafana, Splunk)
familiarity with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab CI etc)
excellent communication and interpersonal skills, with the ability to collaborate effectively across teams
strong problem-solving skills and a proactive approach to identifying and addressing issues
ability to thrive in a fast-paced, dynamic environment and manage multiple priorities
experience with Agile methodologies and DevOps practices
Nice to have:
experience with large-scale distributed systems using message queues, TPMs, or other related technologies in a mobile/portal environment
Agile/PM certifications
ITCAM/Splunk experience
healthcare experience or big box retail experience
Welcome to
CrawlJobs.com
– Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.