Microsoft

Senior Reliability Engineer

Job Description

Posted on: 
2025-12-04

Responsibilities

  • Build and apply specialized knowledge across multiple production aspects, including monitoring, release engineering, and performance optimization.
  • Analyze large-scale telemetry and operational data to drive data-informed decisions.
  • Implement principles such as safe deployment, testing for reliability, and disaster recovery.
  • Respond to alerts and incidents to maintain system reliability.
  • Create and follow playbooks for root cause analysis and reviews.
  • Collaborate with hardware and firmware teams to understand system behavior and improve predictive analytics.
  • Participate in an on-call rotation and be available during non-standard business hours.

Job Requirements

  • Master's or Bachelor's Degree in Computer Science or related field with relevant technical experience.
  • 3+ years of experience in software engineering or operations for large-scale distributed systems.
  • Ability to support a 24x7 data center environment.
  • Proficiency in programming languages such as C#, Python, or Go.
  • Understanding of cloud infrastructure, networking, and system design.
  • Familiarity with monitoring tools and DevOps practices.
  • Ability to meet security screening requirements.
Apply now

More job openings