

Senior Site Reliability Engineer
TEKsystems
Posted Tuesday, October 28, 2025
Posting ID: JP-005635268
Duration: 3 month w2 contract to hire
Location: 4 days onsite & 1 day remote- Raleigh, Charlotte, NC or Atlanta, GA
Top Skills' Details
1. 7+ years of experience within SRE. The hiring manager is more focused on SRE Practice (being able to bring the knowledge of production to me SLO's), less focused on the DevOps (more of a nice to have)
2. Solid scripting knowledge and experience within any of the following - Python, Go, Bash, Javascript, and Shell (does not need to be proficient in all of these, just very knowledgeable in at least one of them)
3. Main Tech Stack: Dynatrace, Datadog, ELK. Ansible experience is a nice to have as they are starting to utilize that
4. SRE certifications are a requirement as well as a bachelor's degree (see education requirement in job description)
*** Fintech experience is a very nice to have ***
Description
This customer is currently in their journey of establishing a SRE practice/platform. The goal is to build a solid observability of the platform and evaluate the tool stack they will look to implement. They have a code team in place now and are looking to augment that team with a staff engineer. What they need is someone with industry knowledge in SRE - Working with vendors, hands on technical experience, technical guidance to more junior members of team and help Lead migrations from one tool to another.
Key Functions/Duties of Position:
• Define, and track reliability and observability OKRs. This includes defining and tracking Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
• Implement robust monitoring and alerting systems to proactively monitor health, identify potential issues, analyze system performance, and facilitate quick response to incidents.
• Implement AIOps functionality to enable auto-response, self-healing, and anomaly trend analysis.
• Drive the development and implementation of automation solutions to remove “toil”, streamline processes, reduce manual interventions, and enhance the overall efficiency of the product engineering and SRE teams.
• Identifying and addressing performance bottlenecks in applications and infrastructure to improve efficiency and user experience.
• Work closely with incident management to quickly address and resolve system outages or performance issues to minimize downtime and impact on users.
• Collaborate actively with development and operations teams to implement observability and resiliency requirements in order to ensure smooth deployment and operation of software systems.
• Lead the coordination with product, development, infrastructure, and architecture teams to conduct capacity planning, ensuring that systems can handle current and future demand; anticipate growth and scalability requirements.
• Improve reliability by identifying and addressing gaps in our architecture, services, and tooling.
• Modernize disaster recovery program for both on premise and Cloud-based Berkley solutions.
• Provide technical leadership and mentorship to other engineers, fostering a culture of learning and continuous improvement.
Education Requirement
• Bachelor's degree in computer science, Information Technology, or a related field (or a combination of education and equivalent experience).
- Technical ability
- Long history of triaging complex issues on bridge calls not just recent
 - Experience in profiling Java/.Net code
 - Ability to understand basic networking, including ability to decipher packet captures
- Linux/AIX exposure
 
 - Exposure to enterprise level technology stacks
 - Experience training others in some capacity
 - Utilizing observability platform/tools; datadog, spunk, Dynatrace etc.
 
 - The individuals we are seeking will make impacts within their company and will be irreplaceable. 
- We are looking to those enterprise leaders, not just a cog in the wheel
 
 
- Enterprise Technology Exposure: Brush up on AIX and RHEL/Linux environments. Even basic familiarity can demonstrate initiative.
 - Java Proficiency: Since Java is central to the role, consider reviewing Java fundamentals and how they relate to application performance and troubleshooting.
 - SQL and OpenShift: Prepare to speak to your experience or learning efforts with SQL queries and OpenShift container orchestration.
 - Networking Fundamentals: Review core networking concepts such as TCP/IP, DNS, firewalls, and packet analysis. Be ready to interpret packet captures.
 - Domain Expertise: Identify and articulate a technical area where you have depth—whether it's observability, infrastructure automation, or container orchestration.
 - Troubleshooting Autonomy: Practice walking through complex issues independently. Be ready to demonstrate how you would lead or contribute meaningfully on bridge calls.
 
Additional Recommendations:
- Application Performance Monitoring: Be ready to discuss your experience with tools like AppDynamics, Datadog, or similar platforms.
 - Bridge Call Experience: Share examples of how you've handled high-pressure troubleshooting scenarios, especially in production environments.
 - Code Profiling: If applicable, mention any experience profiling Java or .NET applications to identify performance bottlenecks.
 - Training and Mentorship: Highlight any experience training others or contributing to team knowledge-sharing.
 - Enterprise Impact: Position yourself as someone who drives change and adds unique value—not just a contributor, but a leader.
 - Career Stability: If applicable, emphasize your consistent employment history and resilience in dynamic environments.
 
Experience Level
Expert Level