About this role
Our client is seeking a skilled Site Reliability Engineer 2 to join their dynamic team. In this role, you will be responsible for ensuring the reliability, availability, and performance of critical systems and services.
Key Responsibilities:
- Design, implement, and maintain scalable and reliable systems.
- Monitor system performance and troubleshoot issues proactively.
- Collaborate with development teams to improve system architecture and deployment processes.
- Automate operational tasks and processes to enhance efficiency.
- Participate in on-call rotations to provide support for production systems.
- Conduct post-incident reviews and implement improvements based on findings.
Required Skills & Qualifications:
- Strong experience in cloud platforms such as AWS, Azure, or Google Cloud.
- Proficiency in scripting languages such as Python, Bash, or Ruby.
- Familiarity with containerization technologies like Docker and orchestration tools like Kubernetes.
- Experience with monitoring tools such as Prometheus, Grafana, or ELK stack.
- Solid understanding of networking concepts and protocols.
- Excellent problem-solving skills and the ability to work under pressure.
Experience:
- 5-8 years of experience in Site Reliability Engineering or related fields.
What we offer:
- An opportunity to work in a fast-paced and innovative environment.
- A collaborative team culture that values your input and ideas.
- Opportunities for professional growth and development.
This role is managed by AI-First Talent on behalf of our client. Your application is reviewed directly by our talent team.