About this role
Our client is seeking a skilled Site Reliability Engineer 2 specializing in Big Data to join their dynamic team. This role is crucial for ensuring the reliability, performance, and scalability of their large-scale data processing systems. The ideal candidate will have a strong background in site reliability engineering, with a focus on big data technologies.
Key Responsibilities:
- Design, implement, and maintain systems for monitoring and alerting on the performance of big data applications.
- Collaborate with development teams to ensure that reliability and scalability are considered during the design and implementation phases.
- Troubleshoot and resolve production issues in a timely manner to minimize downtime.
- Automate operational tasks to improve efficiency and reliability of data processing workflows.
- Participate in on-call rotations and incident response efforts to maintain service uptime.
- Analyze system performance metrics and implement improvements to enhance system reliability.
- Document processes, systems, and procedures to ensure knowledge sharing within the team.
Required Skills & Qualifications:
- Proficiency in big data technologies such as Hadoop, Spark, or Kafka.
- Strong experience with cloud platforms, preferably AWS or Azure.
- Familiarity with containerization and orchestration tools like Docker and Kubernetes.
- Solid understanding of scripting languages such as Python, Bash, or similar.
- Experience with monitoring tools like Prometheus, Grafana, or ELK stack.
- Knowledge of database systems, both SQL and NoSQL, is a plus.
- Excellent problem-solving skills and the ability to work under pressure.
Experience Level:
- A minimum of 3-5 years of experience in site reliability engineering or a related field, with a focus on big data systems.
- Proven track record of managing large-scale production environments.
What we offer:
- An opportunity to work in a fast-paced and innovative environment.
- The chance to be part of a team that is at the forefront of digital payments technology.
- Professional development opportunities to enhance your skills and career growth.
- A collaborative and inclusive workplace culture that values diversity and teamwork.
This role is managed by AI-First Talent on behalf of our client. Your application is reviewed directly by our talent team.