CircleCI is seeking a Site Reliability Engineer to work closely with our Software Engineers to deliver and manage the high-performance and scalable infrastructure underlying our multi-tenant Cloud offering. You will not only have the chance to automate and optimize infrastructure through the construction of appropriate tooling, but you will help software engineers through the design phase to optimize their services for scale in our production /environment.
You'll join a highly-distributed team within our Platform Engineering group to improve site integrity. You will design and implement infrastructure solutions and processes.
About this role:
- Design and deliver solutions to improve the availability, scalability, latency, and efficiency of CircleCI’s services
- Foster a culture of observability and monitoring; helping your team use operational data to improve the stability and performance of our systems
- Diagnose and resolve production issues in conjunction with software engineering teams
- Implement shared infrastructure used by all services and teams within the CircleCI platform
- Support and advise software engineering teams in the design of scalable services
- Build and maintain tools for deployment, monitoring, and debugging
- Execute disaster recovery drills
- Participate in rotating on-call duties, including incident management
- Proficiency in one or more of: Go, Java, Python, C or C++, Clojure
- Experience working with Docker, Kubernetes, Terraform, Helm, AWS, and modern distributed SaaS infrastructure.
- Knowledge of virtualization technologies, such as VMware or KVM
- Understanding of standard networking protocols and components such as: TCP/IP, HTTP, DNS, ICMP, VLANs, the OSI Model, IP Subnetting, and Load Balancing
- Knowledge of operating systems (processes, threads, IPC, concurrency, locks, mutexes, semaphores, etc.)
- Understanding of good monitoring and alerting practices, using tools like Datadog and Pagerduty
- Knowledge of the internal workings of at least one of: PostgreSQL, MongoDB, Redis
- Focus on security in the delivery of all levels of a system
- Passion for modern software development and operation, including agile, CI/CD, and infrastructure-as-code
- Desire to learn and grow career as a Site Reliability Engineer
- 2 or more years of experience
How to apply
Submit your application online via the Apply Now button. Please include a cover letter that describes why you're interested in working for CircleCI and summarize how your experience and career goals fit the qualifications for the position.
We know there’s no such thing as a “perfect” candidate - we’re all a work in progress and are growing new skills and capabilities all the time. CircleCI welcomes those who are enthusiastic about learning and evolving, so however you identify and whatever your background, if this looks like a role where you could do work that excites you, we hope you’ll apply.
CircleCI is the world’s largest shared continuous integration and continuous delivery (CI/CD) platform, and the central hub where code moves from idea to delivery. As one of the most-used DevOps tools that processes more than 1 million builds a day, CircleCI has unique access to data on how engineering teams work, and how their code runs. Companies like Spotify, Coinbase, Stitch Fix, and BuzzFeed use us to improve engineering team productivity, release better products, and get to market faster.
CircleCI is proud to be an Equal Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law.
Ace Your Application.