Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.


The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Upcoming role

This role is not yet open for application. If you would like to learn more or if you'd like to be notified when the application is open, please sign up join our mailing list. - DevOps/Site Reliability Engineer will soon be accepting applications for a GS-15 - DevOps/Site Reliability Engineer. The target date for when this position will be officially open to application has not yet been determined. If you’d like to be notified when this position is open, sign up to our mailing list.

Applications will be open for submission on tbd. Check out Join TTS Hiring Process to learn more about the application process.


Salary Range: The base salary range for this position is: GS-15 Step 1 - $106,595 to GS-15 Step 10 $138,572

Please note the maximum salary available for the GS pay system is $166,500

The base salary range does not include any adjustment for locality. Your locality is most likely going to be determined by where you live since most of our positions are remote. If the position isn’t remote, then your locality will be determined by the location of the office where the position is based.

You can find more information about this in the compensation and benefits section on our site.

For specific details on locality pay, please visit OPM’s Salaries & Wages page or for a salary calculator OPM’s 2019 General Schedule (GS) Salary Calculator.

Who May Apply: All United States citizens and nationals (residents of American Samoa and Swains Islands) and applicants must not be GSA employees or contractors

Role Summary:

DevOps/Site Reliability Engineer - GS-15 gives the public simple, secure access to multiple government services through one verified account. has over 10 million users and is growing the team as we scale quickly. is looking for qualified Site Reliability / DevOps engineers to join our infrastructure team. We care deeply about providing the best possible experience to anyone using government digital services and we are committed to making the process of accessing government services easy while combating fraud and abuse. A qualified candidate is ready to quickly jump in and help in a number of areas: using site reliability engineering best practices to build and operate the infrastructure at scale, responding to incidents and leading incident response and postmortem review, creating automation in areas such as security compliance and code deployment, and meeting with engineers and executives from prospective government agency customers to determine how can adapt to meet their user identity needs.

The team operates like a startup within the government, working in the open as a distributed, agile team. The core product is open source, hosted in modern cloud infrastructure, and built for scale. Tens of millions of people have accounts, and we aim to be the preferred entrypoint for all government digital services. Our users today include people accessing benefits, applying for government jobs, serving in the military, and collecting funds awarded through grant programs.

As part of the infrastructure engineering team, you will play a key role in making government services more secure and accessible to the public.

Key Objectives

Objective #1: Operate with high standards of performance and reliability:

  • Define key success metrics for infrastructure and drive improvement toward those measures
  • Create and improve monitoring systems to collect data about the application, notify on any errors, and improve visibility/observability into application behavior.
  • Assist application teams in deploying code to the application regularly and as automatically as possible
  • Lead incident response and mitigate site errors as they occur
  • Lead postmortem discussions and drive continuous improvement to prevent similar outages
  • Participate in oncall shifts, serving as first-line support for incidents. Drive down page frequency as low as possible (We currently page ~1-2 times per month)

Objective #2: Build’s infrastructure using modern cloud infrastructure techniques:

  • Use infrastructure-as-code (currently Terraform) and configuration management (currently Chef) to automate’s AWS infrastructure
  • Review code and consult with other engineers on new features and their implications for site performance, reliability, and security for the security of Ruby on Rails services
  • Conduct load tests to ensure the application is ready to handle projected user traffic
  • Improve automation and fault tolerance of the deployment process
  • Drive long-term improvement in system availability by removing single points of failure

Objective #3: Collaborate with the team and outside partners:

  • Handle site issues from partner agencies, dealing both with engineers and non-engineers
  • Oversee procurement process for tools and services used by
  • Work alongside talent specialists to continue hiring new engineers and project managers into the team
  • Advocate for modern information security principles throughout the system
  • Balance agile development with mandatory government security compliance policies
  • Support a safe, inclusive workplace and a positive team culture where all team members value diversity and individual differences

Application Evaluation

The information in this sections outlines the criteria that your application will be evaluated against to determine if you meet the Qualifications for the position. There are two very important things to note about this step in the process:

  1. Only applications found “minimally qualified” are shared with the hiring manager and are the only candidates eligible to be interviewed
  2. The Minimum Qualification determination can only be made using the information that’s directly within your resume and directly associated your listed work experience.
    • Examples of stuff that can’t be used:
    • Links to portfolios or other external materials (Yes, the links themselves may be “directly” on the resume but the information is not).
    • Information you include in cover letters, responses to questions, etc. as these are not directly associated with your work experience
    • Lists of tools, technologies, programming languages, etc. that are listed separately from your work experience

The Qualification process is a bureaucratic requirement that we are stuck with. It’s best to think about it as the most intense and rigorous resume review you’ve ever heard of. To get through this process you need make sure your resume directly reflects the Qualifications listed below. We also have more guidance on creating a federal style resume on Join TTS Hiring Process


All applications will be reviewed by a panel of subject matter experts against a scoring rubric created for this role. In order to properly be able to evaluate your previous experience, we recommend being as detailed as possible in your resume and following our general guidance on creating federal style resume.

To qualify for this role, you must have one year of specialized experience equivalent to the GS-14 in the Federal service. Specialized experience is:

  1. Experience being a part of a team to deliver digital products or services. This experience must include ALL of the following:
    • Providing technical support or product development for clients
    • Delivering tools or products with high uptime or availability requirements (i.e. SLAs of 99.9%+)
    • Experience using Site Reliability Engineering or DevOps practices in a production environment
  2. Experience providing technical expertise on projects or initiatives to deliver digital products or services. This experience must include ONE of the following:
    • Conducting technology evaluations
    • Making architectural decisions
    • Developing new software features by writing code
    • Reducing technical debt
    • Leading incident response
  3. Experience deploying, operating, maintaining, or running a cloud infrastructure or platform. This experience must include TWO of the following:
    • Using a cloud computing platform
    • Using cloud computing infrastructure
    • Using continuous integration or continuous deployment tools
    • Using infrastructure automation tooling
    • Using infrastructure monitoring tooling
    • Developing and using software in the Ruby language ecosystem

Qualification determinations cannot be made when resumes do not include the required information, so failure to provide this information may result in disqualification.

For each job on your resume, provide:

  • the exact dates you held each job (from month/year to month/year or “present”)
  • number of hours per week you worked (if part time)

How To Apply

If you would like to learn more or if you’d like to be notified when the application is open, please sign up join our mailing list.