About this role
Senior Platform & Site Reliability Engineer
Location: Remote
Employment Type: Contract
The Role
Our client is seeking a Senior Platform & Site Reliability Engineer to take full architectural and operational ownership of the platform layer across a growing SaaS portfolio. This role is crucial for ensuring the reliability, performance, and scalability of the platform. You will be responsible for the AWS infrastructure standards, including VPCs, account structures, networking, and compute design, while also managing various other critical components of the platform.
Key Responsibilities
- Design, implement, and manage AWS infrastructure to support scalable and reliable applications.
- Own and maintain the CI/CD platform to streamline deployment processes.
- Develop and oversee observability and reliability stacks to ensure system health and performance.
- Manage event streaming infrastructure to facilitate efficient data processing and integration.
- Create and maintain deployment pipelines for seamless application updates.
- Collaborate with development teams to ensure best practices in reliability and performance are followed.
- Troubleshoot and resolve issues related to system performance and reliability.
- Continuously evaluate and improve platform architecture and operational processes.
Required Skills & Qualifications
- Strong experience with AWS services and architecture, including VPCs, EC2, S3, and IAM.
- Proficiency in CI/CD tools and practices, with experience in Jenkins, GitLab CI, or similar.
- Familiarity with observability tools such as Prometheus, Grafana, or ELK stack.
- Experience with event streaming technologies, such as Kafka or AWS Kinesis.
- Strong scripting skills in Python, Bash, or similar languages.
- Knowledge of containerization technologies like Docker and orchestration tools like Kubernetes.
- Excellent problem-solving skills and the ability to work independently in a remote environment.
Experience Level
- Minimum of 5 years of experience in platform engineering or site reliability engineering roles.
- Proven track record of managing complex cloud infrastructures and ensuring high availability.
What We Offer
- Opportunity to work in a dynamic and innovative environment.
- Flexibility of remote work with a collaborative team.
- Chance to contribute to the architectural direction of a growing SaaS portfolio.
- Professional development opportunities to enhance your skills and career growth.
This role is managed by AI-First Talent on behalf of our client. Your application is reviewed directly by our talent team.