About this role

Senior Platform & Site Reliability Engineer

Location: Remote

Employment Type: Contract

The Role

Our client is seeking a Senior Platform & Site Reliability Engineer to take full architectural and operational ownership of the platform layer across a growing SaaS portfolio. This role is crucial for ensuring the reliability, performance, and scalability of the platform. You will be responsible for the AWS infrastructure standards, including VPCs, account structures, networking, and compute design, while also managing various other critical components of the platform.

Key Responsibilities

Design, implement, and manage AWS infrastructure to support scalable and reliable applications.
Own and maintain the CI/CD platform to streamline deployment processes.
Develop and oversee observability and reliability stacks to ensure system health and performance.
Manage event streaming infrastructure to facilitate efficient data processing and integration.
Create and maintain deployment pipelines for seamless application updates.
Collaborate with development teams to ensure best practices in reliability and performance are followed.
Troubleshoot and resolve issues related to system performance and reliability.
Continuously evaluate and improve platform architecture and operational processes.

Required Skills & Qualifications

Strong experience with AWS services and architecture, including VPCs, EC2, S3, and IAM.
Proficiency in CI/CD tools and practices, with experience in Jenkins, GitLab CI, or similar.
Familiarity with observability tools such as Prometheus, Grafana, or ELK stack.
Experience with event streaming technologies, such as Kafka or AWS Kinesis.
Strong scripting skills in Python, Bash, or similar languages.
Knowledge of containerization technologies like Docker and orchestration tools like Kubernetes.
Excellent problem-solving skills and the ability to work independently in a remote environment.

Experience Level

Minimum of 5 years of experience in platform engineering or site reliability engineering roles.
Proven track record of managing complex cloud infrastructures and ensuring high availability.

What We Offer

Opportunity to work in a dynamic and innovative environment.
Flexibility of remote work with a collaborative team.
Chance to contribute to the architectural direction of a growing SaaS portfolio.
Professional development opportunities to enhance your skills and career growth.

This role is managed by AI-First Talent on behalf of our client. Your application is reviewed directly by our talent team.

Senior Platform & Site Reliability Engineer

About this role

More open roles