

Incident Commander
TEKsystems
Posted Tuesday, April 22, 2025
Posting ID: JP-005226839
Description
The Senior Technical Analyst and Incident Commander position will report into the Operational Excellence team. This role will perform deep systems monitoring and analysis to identify or prevent service impacts, focused on the entire end-to-end customer experience. This role will also play a lead facilitator role on a major incident call to guide cross-functional teams to restore service as quickly as possible, which will save revenue, reduce cost and improve customer satisfaction. Responsibilities • Utilize tools such as Big Panda, Grafana and others to monitor systems and identify potential impacts. • Provide guidance to other team members onshore & offshore to help them improve monitoring & analysis. • Identify gaps in monitoring capabilities, provide requirements for new instrumentation & automation for SREs to execute. • Identify opportunities to shift left. Identify opportunities for self healing. • Drive improvement in monitoring capabilities. Immediately respond to major incidents and play a lead facilitator role. • Initiate a bridge call and chat channel Determine which teams need to be engaged in the troubleshooting, and initiate the process of escalation Assess impact. • Partner with Helpdesk teams for input and understanding the customer experience • Ask questions and gather data to help rule out possible factors and narrow down the root cause. Make decisions on what actions will be taken to restore service. This includes assessing risk and approving any emergency changes. • Oversee the restoration and incident closure. Communicate in a formal and structured manner to Technology and Business leaders about major incident impacts and status. • Participate in Post Incident Review (PIR) meetings to identify how we could have avoided the incident to begin with, how we could have diagnosed faster, how we could have resolved faster. Help the Problem Management team identify action items. Ensure accurate details are captured in the associated incident tickets • Engage with application development teams to understand new features that are planned, provide requirements on behalf of the Enterprise Monitoring Center (EMC). • Learn about new technologies and features, ensure the EMC is prepared to support. • Review planned changes/releases with high risk. • Ask questions, identify any concerns to protect system availability. • Provide input toward continuous improvement of monitoring and incident management maturity including automation, communications, incident processes, measuring value, tooling. Desired Skills • Bachelor’s degree in computer science, Information Technology, or Related field Minimum experience of 5 years working in a Technology organization • Excellent verbal and written communications • Outstanding problem solver • Analytical mindset and ability to grasp complex topics • Understanding of application & infrastructure stacks and technologies • Extensive experience utilizing monitoring tools to identify potential incidents, analyzing system logs and dashboards • Extensive experience solving production incidents, restoring service quickly and identifying underlying root cause • Ability to multi-task with multiple priorities • Proven ability to establish and maintain positive customer and team member relationships
Skills
Documentation process, Incident management, Incident response, monitoring tools, Problem management, Technical support, grafana, Support, Distributed systems, Remediation, sre
Top Skills Details
Documentation process,Incident management,Incident response,monitoring tools,Problem management,Technical support
Additional Skills & Qualifications
Desired Skills • Bachelor’s degree in computer science, Information Technology, or Related field Minimum experience of 5 years working in a Technology organization • Excellent verbal and written communications • Outstanding problem solver • Analytical mindset and ability to grasp complex topics • Understanding of application & infrastructure stacks and technologies • Extensive experience utilizing monitoring tools to identify potential incidents, analyzing system logs and dashboards • Extensive experience solving production incidents, restoring service quickly and identifying underlying root cause • Ability to multi-task with multiple priorities • Proven ability to establish and maintain positive customer and team member relationships
Experience Level
Intermediate Level
Contact Information
Email: hoferguson@teksystems.com