We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Site Reliability Engineer (SRE) - Infrastructure and Observability

Worldpay
United States, Ohio, Cincinnati
Jul 18, 2025

Job Description

Are you ready to write your next chapter?

Make your mark at one of the biggest names in payments. We're looking for a Site Reliability Engineer - Infrastructure and Observability to join our Technology Services Operations team and help us unleash the potential of every business.

What you'll own as the Site Reliability Engineer - Infrastructure and Observability:

As a Site Reliability Engineer (SRE) within the WP Technology Services Operations (TSO), you will play a critical role in enhancing the reliability, stability, and performance of our platforms and services in support of innovative fintech products that change the way the world pays, banks and invests. This role blends software engineering with systems engineering to proactively prevent incidents, automate operations, and improve observability across complex environments. You'll collaborate closely with infrastructure, development, and incident management teams to reduce service disruptions, implement scalable solutions, and drive continuous improvement. This is a unique opportunity to help shape a high-performing SRE function from the ground up, with a clear roadmap and strong executive support.

  • Analyze incident data from platforms like ServiceNow, IBM Netcool, Everbridge xMatters, OpsGenie, and PagerDuty to identify recurring issues and service instability trends.

  • Collaborate with cross-functional teams to improve platform availability, stability, and performance.

  • Identify and close observability gaps in logging, monitoring, and alerting; recommend and implement new tools as needed.

  • Integrate pre- and post-change validation testing into CI/CD pipelines and manual deployments.

  • Develop and pilot automated runbooks for common incident types to improve incident response and reduce MTTR.

  • Participate in Change Advisory Boards (CABs), major incident triage, and root cause analysis processes.

  • Evaluate and implement tools for incident remediation, change validation, and performance benchmarking.

  • Contribute to monthly retrospectives, publish quarterly SRE health reports, and drive continuous improvement initiatives.

What you'll bring

  • 3+ years of experience in Site Reliability Engineering, DevOps, or a related technical role.

  • Strong understanding of incident management, root cause analysis, and service reliability principles.

  • Experience in IT Operations, with a focus on observability, and log management.

  • Solid understanding of observability concepts, including metrics, log aggregation, log management, OpenTelemetry (OTEL) concepts and best practices, traces, event management and alerting.

  • Hands-on experience with observability and monitoring tools (e.g., Splunk Enterprise, Splunk Cloud, Splunk Observability, OTEL agents, collectors and gateways, Prometheus, Grafana, Zabbix).

  • Experience developing Splunk queries and dashboards using Splunk Search Processing Language (SPL)

  • Proficiency in scripting languages (e.g., Python, Bash) and infrastructure-as-code tools.

  • Familiarity with CI/CD pipelines and automated testing frameworks.

  • Excellent problem-solving skills and a proactive, collaborative mindset.

  • Strong communication skills and the ability to work effectively across teams.

It's a bonus if you have

  • Experience working in high-availability or financial services environments.

  • Experience with Software Development Life Cycle (SDLC) concepts.

  • Experience working within an AGILE environment.

  • Knowledge of ITIL processes and prior participation in CABs.

  • Familiarity with cloud platforms such as AWS, Azure, or GCP.

  • Exposure to performance benchmarking, capacity planning, and service-level objective (SLO) management.

  • Experience in container monitoring (e.g., Kubernetes, Docker) and cloud-native architectures.

  • Experience with one or more of the following application development or scripting languages: Java, Python, C#, .Net, JavaScript, SQL, C++, Go (Golang), Rust, Scala, Kotlin, Ruby, Unix Scripting (e.g., Bash, Korn Shell)

  • Certifications:

    • Cloud: AWS, Azure

    • Observability: Splunk, Datadog, Dynatrace

    • Infrastructure: RedHat, VMware, MSCE

About the team

Our Tech and Security teams keep us moving each day, no matter where we are in the world. From the hardware to the networks and everything between, they humbly make it all happen.

To learn more about our winning teams, check out our world-class teams that own it every day.

What makes a Worldpayer

What makes a Worldpayer? It's simple: Think, Act, Win. We stay curious, always asking the right questions and finding creative solutions to simplify the complex. We're dynamic, every Worldpayer is empowered to make the right decisions for their customers. And we're determined, always staying open and winning and failing as one.

Does this sound like you? Then you sound like a Worldpayer. Apply now to write the next chapter in your career.

#LI-MP1

#INDTECH2025

Privacy Statement

Worldpay is committed to protecting the privacy and security of all personal information that we process in order to provide services to our clients. For specific information on how Worldpay protects personal information online, please see the Online Privacy Notice.

EEOC Statement

Worldpay is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, marital status, genetic information, national origin, disability, veteran status, and other protected characteristics. The EEO is the Law poster is available here.

If you are made a conditional offer of employment and will be working in the United States, you will be required to undergo a drug test. In developing this job description care was taken to include all competencies and requirements needed to successfully perform the position. Reasonable accommodations will be provided for individuals with qualified disabilities both during the hiring process, as well as to allow the individual to perform the essential functions of the job, if hired.

Sourcing Model

Recruitment at Worldpay works primarily on a direct sourcing model; a relatively small portion of our hiring is through recruitment agencies. Worldpay does not accept resumes from recruitment agencies which are not on the preferred supplier list and is not responsible for any related fees for resumes submitted to job postings, our employees, or any other part of our company.

Applied = 0

(web-6886664d94-5gz94)