Principal Site Reliability Engineer

Sigma Software

Full-time

Remote

Brazil

Technology & Development

Company Description

Are you ready to lead infrastructure strategy for a cutting‑edge AI‑driven SaaS platform? We are looking for a Principal Site Reliability Engineer with a proven track record in scaling, optimizing, and securing cloud‑based systems. This senior role offers the opportunity to shape the reliability and performance of a platform used by finance teams worldwide.

In this role, you will be part of a dynamic engineering environment where your expertise will directly influence product stability and growth. You will work with advanced cloud technologies, automation tools, and AI-driven solutions, contributing to projects that push the boundaries of innovation.

If you are ready to take on strategic responsibility and make a tangible impact, apply now and join us in building the future of reliable, scalable systems.

CUSTOMER
Sigma Software is partnering with a fast‑growing AI‑driven SaaS platform serving finance and accounting teams in high‑growth businesses. The platform automates critical workflows — from billing and collections to revenue recognition and reporting, ensuring compliance and accelerating cash flow. Leveraging advanced AI, it reduces manual work, increases operational efficiency, and supports scalability for customers worldwide.

PROJECT

The project focuses on building and scaling an AI-powered SaaS solution for finance automation. It integrates advanced machine learning models with robust cloud infrastructure to deliver secure, compliant, and high‑performance services. The engineering culture emphasizes automation, resilience, and operational excellence.

Job Description

Define and lead infrastructure and reliability strategy across the platform
Design scalable, resilient systems in collaboration with engineering teams
Optimize build, testing, and deployment processes for speed and stability
Establish and uphold best practices for CI/CD, monitoring, and observability
Lead incident response and drive continuous improvement post‑incident
Automate workflows to reduce operational toil and risk
Mentor engineers and foster a culture of operational excellence
Make strategic build‑vs‑buy decisions balancing speed, quality, and sustainability

Qualifications

At least 8 years of experience in Site Reliability Engineering or DevOps roles, including 2+ years in a Principal or Lead position
Proven experience in infrastructure modernization and scaling initiatives for high‑growth environments
Strong proficiency in Python
Deep expertise in cloud platforms and container orchestration tools such as AWS ECS and EKS
Solid experience in CI/CD pipeline design and optimization using tools like GitHub Actions and Buildkite
Proficiency in infrastructure‑as‑code tools such as Terraform
Strong knowledge of monitoring, observability, and performance optimization practices
Upper-Intermediate level of spoken and written English

WOULD BE A PLUS

Experience with monorepos (Turborepo, pnpm)
Familiarity with modern TypeScript tools (swc, biome, oxc)
Knowledge of NestJS, NextJS, and testing frameworks (Jest, Vitest)

Additional Information

PERSONAL PROFILE

Excellent leadership, communication, and decision‑making abilities
Ability to work independently and make pragmatic build‑vs‑buy decisions in fast‑paced environments

Apply now

Principal Site Reliability Engineer

Company Description

Job Description

Qualifications

Additional Information

More jobs

Software Engineering, Advisor

Peraton

Senior Business Developer

Wipfli