CrawlJobs Logo

Director, Service Reliability Engineering

https://www.marriott.com Logo

Marriott Bonvoy

Location Icon

Location:
United States, Bethesda

Category Icon
Category:
IT - Administration

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

125600.00 - 203700.00 USD / Year

Job Description:

As Director of SRE, you will lead the team responsible for accelerating and automating the flow of operational activities, ensuring the reliability, performance and scalability of Marriott's critical digital platforms. This position involves establishing reliability-focused engineering practices, mentorship, scale improvement, and collaboration across cross-functional teams.

Job Responsibility:

  • Define and execute Marriott’s SRE vision, aligning with business objectives and technology roadmaps
  • Build, mentor and lead a high-performing SRE team, fostering a culture of collaboration and innovation
  • Establish reliability, observability and automation goals to improve system uptime, performance and scalability
  • Partner with engineering, operations and security teams to drive best practices and continuous improvement
  • Implement reliability-focused engineering practices, including SLAs, SLOs/SLIs and error budgets
  • Design and maintain resilient, scalable and fault-tolerant architectures across cloud and hybrid environments
  • Develop strategies to proactively identify and mitigate risks to system performance and availability
  • Drive root cause analysis (RCA) and post-mortem processes to prevent recurring incidents
  • Champion automation in monitoring, deployment and incident resolution to reduce toil and enhance efficiency
  • Lead and optimize incident response processes, ensuring rapid detection, diagnosis, and resolution of system failures
  • Enhance observability by leveraging monitoring, logging and tracing solutions to provide real-time insights
  • Partner with DevOps teams to improve CI/CD pipelines and reduce deployment risk
  • Champions leaders’ vision for product and service delivery
  • Makes and executes the necessary decisions to keep moving forward toward achievement of goals
  • Provides direction and assistance to other teams regarding projects
  • Determines priorities, schedules, plans and necessary resources to promote completion of any projects on schedule
  • Analyzes information and evaluates results to choose the best solution and solve problems
  • Reviews vendor proposals and selects appropriate vendor for services/technologies/hardware
  • Thinks creatively and practically to develop, execute and implement new project plans
  • Generates and provides accurate and timely results in the form of reports, presentations, etc.
  • Plans, develops, implements, and evaluates the quality of operations

Requirements:

  • Undergraduate degree in computer science, software engineering, or a related field (or equivalent experience)
  • 10+ years of experience in SRE, devsecops or IT operations
  • At least 5 years’ experience in a previous leadership role within SRE, devsecops or IT Operations
  • At least five years of experience in the following technologies - Presentation Management: HTML, CSS, JS, Backbone, Node JS, Android, iOS, Application Platforms: NGINX, Java, Akana, Play Framework, Tomcat, Docker, Openshift, Application Data: PostgreSQL, Couchbase, Cassandra, Integration Services: Apache Kafka, Apache Spark, Akana, Analytics Platforms: Hadoop, dashDB, Cognos, Tableau, Security: Forgerock, OpenID, OAUTH, Ping Identity, Public Cloud: Azure, Google Cloud, AliCloud, Amazon Web Services, CI/CD: Harness
  • Experience with test automation
  • Working knowledge and proven track record of implementing disaster indifferent architecture
  • Experience with CDN and Akamai tools
  • Linux/Unix system administration experience
  • Proficient in scripting and programming languages (like Python, Go, Bash, Shell)
  • Hands on experience with infrastructure as code (like Terraform), container orchestration (like Kubernetes), and reliability automation
  • Working knowledge of networking, databases, distributed systems
  • Deep knowledge of monitoring, logging and incident response tools (like Dynatrace, Splunk, OpsGenie, BigPanda, Prometheus, etc.)
  • Experience implementing and maintaining CI/CD pipelines for large-scale applications
  • Experience creating system architectures for disaster recovery implementation and failover during disasters
  • Familiarity with AI/ML-driven observability and predictive maintenance techniques
  • Exceptional problem solving, communication and stakeholder management skills
  • Experience leading, mentoring and developing high performing SRE teams
  • Experience managing large, cross functional vendor teams
  • Experience defining SLOs/SLIs, error budgets, and KPIs to drive accountability and performance
  • Ability to foster a culture of continuous improvement
  • Proven record of staying ahead of industry trends/informed of emerging technologies to enhance system reliability and efficiency
  • Experience in hospitality is preferred

Nice to have:

Experience in hospitality

What we offer:
  • Bonus program
  • comprehensive health care benefits
  • 401(k) plan with up to 5% company match
  • employee stock purchase plan at 15% discount
  • accrued paid time off (including sick leave where applicable)
  • life insurance
  • group disability insurance
  • travel discounts
  • adoption assistance
  • paid parental leave
  • health savings account (except for positions based out of or performed in Hawaii)
  • flexible spending accounts
  • tuition assistance
  • pre-tax commuter benefits
  • other life and work wellness benefits
  • stock awards
  • deferred compensation plans

Additional Information:

Job Posted:
March 21, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.