We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Site Reliability Engineer

Thomas Jefferson National Accelerator Facility (Jefferson Lab)
sick time, tuition assistance, 401(k)
United States, Virginia, Newport News
12000 Jefferson Ave (Show on map)
Mar 31, 2026

At Jefferson Lab,you'llchampioncutting-edgescience and operational excellence while shaping the future of discovery. Join us and make your mark - where excellence meets purpose, andgreat mindstrulymatter.

Salary Range:$115,900 - $205,100 (SCS-III)

What your job will be like:

You embed within the HPDF architecture team to make reliability, resilience, and observability first-class features of the facility's scientific data lifecycle systems - not afterthoughts. You define the initial Service Level Objectives (SLOs) and Service Level Indicators (SLIs), establish monitoring and alerting foundations, influence technology selections across compute, storage, and networking, and build the automation tooling that eliminates manual operations risk. When the facility transitions to operations, you lead the HPDF SRE team, owning availability metrics, incident response, and the continuous improvement processes that keep the facility performing to its design parameters.

In this job you will:
  • Work closely with the rest of the architecture team to review and influence technology choices to establish reliability, and resilience parameters (e.g., meeting expected availability, failure domain isolation, disaster recovery)
  • Ensure the selected software and hardware systems meet those parameters, while also meeting performance expectations and security requirements.
  • Evaluate vendor and open-source solutions against established reliability and resilience parameters, develop comparative assessments, and provide technically grounded recommendations to inform architecture decisions and support acquisitions.
  • Metrics & Observability: Establish the foundation for system observability, defining initial SLOs/SLIs, architecting, prototyping and then implementing comprehensive monitoring, logging, and alerting solutions.
  • Lead the design, prototyping and implementation of these solutions including custom automation to eliminate manual operations and further improve facility resilience.
  • Performance Engineering: Participate in testing and performance analysis to validate reliability and resilience design decisions, to identify bottlenecks and alternative approaches.
  • Establish SRE Team Framework: Define the operational framework, on-call structures, incident response, other operational processes, and staffing plans for the future SRE team, bridging the design-to-operations transition.
Experience
  • Required: 10 or more years SRE (Site Reliability Engineering), DevOps, or Systems Engineering roles
Education
  • Required: Bachelor's Degree Computer Science or related field
  • Preferred: Master's Degree Computer Science or related field
Experience and Education Exchange

Education above the minimum may be substituted for experience.

Knowledge, Skills, and Abilities
  • High: Deep experience and understanding of distributed systems principles, failure modes, consensus protocols and self-healing architectures.
  • High: Expertise in defining and implementing SLOs and SLIs and comprehensive monitoring stacks and experience architecting observability frameworks in greenfield environments (e.g. Prometheus, ELK, OpenTelemetry)
  • High: Strong scripting and automation skills (Go, Python, Shell).
  • Medium: Deep experience with public cloud environments (AWS, Azure, GCP) and container orchestration (Kubernetes).
  • Medium: Experience with configuration management and IaC tools (e.g., Terraform, Puppet, Ansible).
  • Medium: Experience with IPv4 and IPv6 networking, high-speed interconnects and data transfer protocols, familiarity with network reliability patterns and software-defined networking (pref)
  • Low: Experience with HPC infrastructure and environments (pref)
  • Low: Experience leading or mentoring small teams (pref)

About Jefferson Lab

Join a community with a common purpose of solving the most challenging scientific and engineering problems of our time. The Jefferson Lab campusis located insoutheasternVirginiaamidst a vibrant and growing technology community.

A career at Jefferson Lab is more than a job. You will be part of "big science" and work alongside top scientists and engineers from around the world unlocking the secrets of our visible universe. Managed by Jefferson Science Associates,LLC,Thomas Jefferson National Accelerator Facility is entering an exciting period of mission growth and is seeking new team members ready to apply their skills and passion to have an impact. You could call it work, or you could call it a mission. We call it a challenge. We do things that will change the world.

Total Rewards at Jefferson Lab

At Jefferson Lab, we believe that a comprehensive employee benefits program is an important and meaningful part of the compensation employees receive. Our benefits program includes, but is not limited to:

* Medical, Dental, and Vision Care Plans * Flexible Spending Accounts

* Paid Time-off and Leave Programs (Paid Parental, vacation, holidays, and sick leave)

* 401(k) Plan - 9% Lab Contribution; 100% vested * Flexible Work Arrangements

(Remote & Alternate Work Schedules available)

* Tuition Assistance, Training and Professional Development Programs

* Live near the waterways of the Chesapeake Bay region with access to nearby beaches,

mountains, and all major metropolitan centers on the East Coast

Jefferson ScienceAssociates,LLC (JSA) manages andoperatesthe Thomas Jefferson National Accelerator Facility (Jefferson Lab). JSA is an Equal Opportunity Employer.

JSA is committed to providing reasonableaccommodationforpeoplewith disabilities (unless doing so will result in an undue hardship). If you need a reasonable accommodation for any part of the employment process, please send an e-mail to recruiting@jlab.org or contact Human Resources by calling (757) 269-7100 and selecting option 1 between 8 am - 5 pm EST to provide the nature of your request.


Employment with JSA is conditional upon DOE approval if at any time during your employment you areparticipatingin a Foreign Government Talent Recruitment Program or Affiliated activity.Generally, suchprograms/activities include any foreign-state-sponsored attempt toacquireU.S.-funded scientific research through programs run or funded by the government that target scientists, engineers, students, academics, researchers, and entrepreneurs of all nationalities working or educated in the United States.This includes positions or appointments, both domestic and foreign, titled academic, professional, or institutional appointmentswhether or notremuneration is received and whether full-time, part-time or voluntary.



Applied = 0

(web-bd9584865-thmxh)