This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The SRE Service Availability Manager plays a key role in ensuring the peak performance and availability of our Enterprise IT infrastructure and services. This position combines proactive site reliability engineering with adept incident command to lead our efforts in minimizing service disruptions and enhancing our technology landscape. With a focus on automation, cloud technologies, and continuous process improvement, the ideal candidate brings a mix of technical expertise and leadership skills, aimed at delivering exceptional service reliability. This role demands a proactive problem-solver with extensive experience in IT operations and a passion for innovation, ready to tackle challenges in a dynamic, 24x7x365 environment.
Job Responsibility:
Serve as Incident Commander during major incidents, leading response efforts to restore services and minimize impact on business and consumer operations
Design and implement automation tools to reduce manual intervention, improve system performance, and prevent incidents
Assess application architectures to identify key monitoring points and performance indicators
Develop and maintain comprehensive monitoring and alerting frameworks to detect and address anomalies before they escalate to incidents
Collaborate closely with development, operations, and support teams for continuous improvement of service reliability and incident response processes
Conduct thorough post-mortems to analyze incidents, identify root causes, and implement preventative measures to avoid recurrence
Effectively communicate incident status, impact, and post-incident reports to stakeholders at all levels of the organization
Stay informed on the latest industry trends, technologies, and practices in site reliability engineering and incident management.
Requirements:
5+ years of experience in an information technology environment
3 years of experience in information technology focused on IT Operations that include troubleshooting complex network, server, storage, and/or application issues
2 years minimum operations experience involving incident, problem, change, and release management that included leading calls and documenting outcomes
Undergraduate degree or or equivalent experience/certification
Ability to cover shifts in a 24x7x365 environment and on-call responsibilities
Proficiency in scripting languages (Python, Shell) and familiarity with automation tools (such as Ansible, Jenkins)
Experience with cloud platforms (AWS, Azure, GCP), infrastructure as code, and containerization technologies
Experience in incident command or incident management in a technology environment
Strong problem-solving, organizational, and analytical skills.
Nice to have:
ITIL Foundations v3+ Certification
Demonstrated experience with ITSM suites, e.g., ServiceNow
Demonstrated experience with various monitoring, performance, or capacity tools
Experience with continuous integration/continuous deployment (CI/CD) pipelines and DevOps practices
Familiarity with Site Reliability Engineering principles and concepts
Strong leadership qualities, including decisiveness, and the ability to motivate teams, along with the ability to manage stressful situations calmly and effectively
Ability to create constructive relationships, influence, and communicate with varying levels of associates and management
Ability to solve complex, cross-functional issues
Strong knowledge of Server, Storage, Network, Middleware, Application and Cloud technologies
A high degree of curiosity and a drive to seek more efficient ways of delivering service.
Welcome to
CrawlJobs.com
– Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.