CrawlJobs Logo

Senior Manager Site Reliability Engineering

https://www.cvshealth.com/ Logo

CVS Health

Location Icon

Location:
United States, Richardson

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

118450.00 - 260590.00 USD / Year

Job Description:

As a Senior Manager of Site Reliability Engineering (SRE) at CVS Health, you will lead a team of SREs responsible for ensuring the reliability, availability, and performance of critical systems and services. This is a high-performing integration platform processing about 6 billion transactions every month. You will collaborate with cross-functional teams to design, implement, and maintain scalable and resilient infrastructure solutions that support business objectives. Your leadership will drive the adoption of best practices in site reliability, incident management, and continuous improvement.

Job Responsibility:

  • Lead and mentor a team of Site Reliability Engineers, fostering a culture of collaboration, innovation, and continuous learning
  • ensure the availability, reliability, and performance of critical services through proactive monitoring, capacity planning, and performance tuning
  • design, implement, and maintain observability solutions using tools such as AppDynamics, Splunk, Prometheus, Grafana, or Open Telemetry
  • collaborate with software engineering, operations, and product teams to design and deploy scalable and resilient systems
  • oversee incident management processes, ensuring timely resolution of incidents and minimizing downtime
  • establish and monitor key performance indicators (KPIs) to measure system reliability and performance
  • conduct post-incident reviews and implement lessons learned to prevent future occurrences
  • stay current with industry trends and emerging technologies to continuously improve SRE practices
  • manage budgets and resources effectively to support SRE initiatives and projects
  • incident management: lead incident response efforts, perform root cause analysis (RCA), and drive post-mortem processes to improve system reliability
  • automation & infrastructure as code (IaC): develop automation to reduce manual operational tasks using Terraform, Ansible, or Kubernetes
  • CI/CD & deployment pipelines: work closely with development teams to enhance deployment strategies and improve continuous integration/continuous deployment (CI/CD) workflows
  • cloud & Kubernetes operations: manage and optimize cloud infrastructure (AWS, Azure, or GCP) and container orchestration platforms (Kubernetes, Docker)
  • implement best practices for security, compliance, and cost optimization in cloud environments

Requirements:

  • 7+ years of experience in site reliability engineering, DevOps, or a related field
  • 5+ years of experience of cloud computing platforms (e.g., AWS, Azure, GCP) and container orchestration (e.g., Kubernetes, Docker)
  • 3+ years of experience in a leadership or management role, with a proven track record of managing high-performing teams
  • 3+ years of experience in scripting and programming languages (e.g., Python, Go, Java)
  • 3+ years of experience in monitoring and observability tools (e.g., Prometheus, Grafana, Splunk)
  • familiarity with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab CI etc)
  • excellent communication and interpersonal skills, with the ability to collaborate effectively across teams
  • strong problem-solving skills and a proactive approach to identifying and addressing issues
  • ability to thrive in a fast-paced, dynamic environment and manage multiple priorities
  • experience with Agile methodologies and DevOps practices

Nice to have:

  • experience with large-scale distributed systems using message queues, TPMs, or other related technologies in a mobile/portal environment
  • Agile/PM certifications
  • ITCAM/Splunk experience
  • healthcare experience or big box retail experience
What we offer:
  • medical, dental, and vision benefits
  • 401(k) retirement savings plan
  • employee stock purchase plan
  • fully-paid term life insurance plan
  • short-term and long-term disability benefits
  • well-being programs
  • education assistance
  • free development courses
  • CVS store discount
  • discount programs with participating partners
  • paid time off (PTO)
  • paid holidays

Additional Information:

Job Posted:
March 19, 2025

Expiration:
May 05, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.