Associate Principal, Site Reliability Engineering
![]() | |
![]() | |
![]() United States, Texas, Dallas | |
![]() | |
Duties: Support the availability and performance of the next generation of OCC's Cloud Platform. Enhance system reliability and developer productivity through automation and provide guidance to development teams in the areas of cloud technologies, application profiling and monitoring, logging, metrics collection and analysis. Architect, develop and maintain shared services and tools to improve reliability and reduce toil across the organization. Develop automation for incident response and to prevent problem recurrence. Implement secure infrastructure automation using Ansible, HashiCorp Vault and Cyberark for managing secrets and IAM role-based access control, automate the creation and management of access tokens, and ensure the encryption of sensitive data across environments. Collaborate with development, operations and infrastructure teams to ensure availability of services, and to work through implementation issues. Create and enhance runbooks to respond to service outages or degradations. Assess the production readiness of services. Define and track operational metrics for production performance, reliability, scalability and availability. Contribute to the team's continuous improvement through research, retrospectives, discussion groups and code reviews. Guide and mentor junior members, and prepare stories for the sprint backlog. Deploy containerized applications using Docker on EKS cluster managed by Amazon for Kubernetes. Automate the process of deployments to Apache Tomcat application servers. Automate the setup, scaling, and management of cloud infrastructure using Terraform, AWS, and Rancher Kubernetes, enabling seamless provisioning and management of clusters and services across cloud environments, and automate containerized deployments on Rancher Kubernetes. Build and manage robust CI/CD pipelines using GitHub and Jenkins along with Harness deployments to ensure consistent application delivery across environments, track and visualize performance metrics and application health using Open Telemetry and Splunk Observability, and aggregate logs and monitor application performance across multiple systems using Splunk Cloud. Implement observability strategies using Splunk and set up custom dashboards to monitor system health, service level objectives (SLOs), and key performance indicators (KPIs) for rapid detection and troubleshoot issues and seamless rollouts of new features as well as updates on Kubernetes-managed environments. Up to 40% telecommuting permitted. *This position qualifies for The Options Clearing Corporation's Employee Referral Program.* Education & Experience Required: Bachelor's degree in computer science, information systems, engineering, or related and three (3) years of experience as a Software Engineer, DevOps Engineer, or related Special Skills Required: Must have work experience with each of the following: 1) Automate the setup, scaling, and management of cloud infrastructure using Terraform, AWS, and Rancher Kubernetes, enabling seamless provisioning and management of clusters and services across cloud environments, and automate containerized deployments on Rancher Kubernetes; 2) Build and manage robust CI/CD pipelines using GitHub and Jenkins along with Harness deployments to ensure consistent application delivery across environments, track and visualize performance metrics and application health using Open Telemetry and Splunk Observability, and aggregate logs and monitor application performance across multiple systems using Splunk Cloud; 3) Implement observability strategies using Splunk and set up custom dashboards to monitor system health, service level objectives (SLOs), and key performance indicators (KPIs) for rapid detection and troubleshoot issues and seamless rollouts of new features as well as updates on Kubernetes-managed environments; and 4) Implement secure infrastructure automation using Ansible, HashiCorp Vault and Cyberark for managing secrets and IAM role-based access control, automate the creation and management of access tokens, and ensure the encryption of sensitive data across environments. Up to 40% telecommuting permitted. Salary $112,923-$148,700 Apply: Apply online at www.theocc.com. No calls. EOE. About Us The Options Clearing Corporation (OCC) is the world's largest equity derivatives clearing organization. Founded in 1973, OCC is dedicated to promoting stability and market integrity by delivering clearing and settlement services for options, futures and securities lending transactions. As a Systemically Important Financial Market Utility (SIFMU), OCC operates under the jurisdiction of the U.S. Securities and Exchange Commission (SEC), the U.S. Commodity Futures Trading Commission (CFTC), and the Board of Governors of the Federal Reserve System. OCC has more than 100 clearing members and provides central counterparty (CCP) clearing and settlement services to 19 exchanges and trading platforms. More information about OCC is available at www.theocc.com. Benefits A highly collaborative and supportive environment developed to encourage work-life balance and employee wellness. Some of these components include:
Visit https://www.theocc.com/careers/thriving-together for more information. Compensation
Step 1 Step 2 Step 3 For more information about OCC, please click here. OCC is an Equal Opportunity Employer |