About this role

Our client is seeking a skilled Site Reliability Engineer 2 to join their dynamic team. In this role, you will be responsible for ensuring the reliability, availability, and performance of critical systems and services.

Key Responsibilities:

Design, implement, and maintain scalable and reliable systems.
Monitor system performance and troubleshoot issues proactively.
Collaborate with development teams to improve system architecture and deployment processes.
Automate operational tasks and processes to enhance efficiency.
Participate in on-call rotations to provide support for production systems.
Conduct post-incident reviews and implement improvements based on findings.

Required Skills & Qualifications:

Strong experience in cloud platforms such as AWS, Azure, or Google Cloud.
Proficiency in scripting languages such as Python, Bash, or Ruby.
Familiarity with containerization technologies like Docker and orchestration tools like Kubernetes.
Experience with monitoring tools such as Prometheus, Grafana, or ELK stack.
Solid understanding of networking concepts and protocols.
Excellent problem-solving skills and the ability to work under pressure.

Experience:

5-8 years of experience in Site Reliability Engineering or related fields.

What we offer:

An opportunity to work in a fast-paced and innovative environment.
A collaborative team culture that values your input and ideas.
Opportunities for professional growth and development.

This role is managed by AI-First Talent on behalf of our client. Your application is reviewed directly by our talent team.

Site Reliability Engineer 2

About this role

More open roles