About this role
Our client is seeking a highly skilled Site Reliability Engineer (SRE) with a focus on Big Data to join their dynamic team. In this role, you will be responsible for ensuring the reliability, availability, and performance of large-scale data systems. You will work closely with development and operations teams to implement best practices for system reliability and performance optimization.
Key Responsibilities:
- Design and implement scalable and reliable data infrastructure solutions.
- Monitor system performance and troubleshoot issues to ensure high availability.
- Collaborate with development teams to improve application performance and reliability.
- Automate operational processes and improve system efficiency.
- Conduct capacity planning and performance tuning for data systems.
- Implement and maintain monitoring and alerting systems for proactive issue resolution.
- Participate in on-call rotations and respond to incidents as needed.
Required Skills & Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Strong experience with big data technologies such as Hadoop, Spark, or Kafka.
- Proficiency in scripting languages like Python or Bash.
- Experience with cloud platforms such as AWS, Azure, or Google Cloud.
- Solid understanding of Linux/Unix systems and networking concepts.
- Familiarity with containerization technologies like Docker and orchestration tools like Kubernetes.
- Excellent problem-solving skills and the ability to work under pressure.
Experience:
- 7-11 years of relevant experience in Site Reliability Engineering or a similar role with a focus on Big Data technologies.
What we offer:
- Opportunity to work with cutting-edge technologies in a fast-paced environment.
- Collaborative and innovative team culture.
- Professional development and growth opportunities.
This role is managed by AI-First Talent on behalf of our client. Your application is reviewed directly by our talent team.