ITSM frameworks : Problem Management Glossary of Key Terms

Glossary of Key Terms

Problem management is the process of identifying and managing the causes of incidents on an IT service. It is a core component of ITSM frameworks.

Below are essential terms related to problem management, incident response, and related processes.

A3 Framework: A problem-solving and continuous improvement approach involving planning, testing, reviewing, and acting on improvement opportunities
Apollo Root Cause Analysis: A visual mapping technique for cause and effect that focuses on finding multiple solutions to break the causal chain
Change Management: A structured approach for implementing changes to the IT environment, managing risks and ensuring smooth transitions
Clarify and Verify: A first-step technique in problem-solving where details are confirmed to prevent misunderstandings and unnecessary escalations
Continuous Improvement: Ongoing efforts to enhance processes and services based on insights from problem management
Corrective and Preventive Actions (CAPA): A structured process to prevent recurrence by analyzing actions that mitigate and correct issues
Decision-Making Framework: A structured approach involving identifying issues, assessing risks, analyzing options, selecting a strategy, implementing it, and monitoring results
Eisenhower Matrix: A prioritization tool that categorizes tasks based on their urgency and importance to help focus on what truly matters
Error Control: Managing known errors by documenting workarounds and implementing permanent solutions where feasible
Event Management: Monitoring systems and services to detect potential issues before they escalate
Failure Mode and Effects Analysis (FMEA): A proactive method for identifying potential failure points in processes and assessing their impact
Fishbone Diagram (Ishikawa): A visualization tool that organizes causes into categories to help in brainstorming and identifying potential causes of a problem
Five Whys: A technique in root cause analysis involving repeated “why” questions to uncover deeper causes of a problem
Five Whys, Two Hows (5W2H): A structured approach to gathering data using “What, Where, When, Why, Who, How, and How much” to define an issue comprehensively
Incident: An unexpected event disrupting normal service, requiring immediate response to restore operations
Kepner-Tregoe Analysis: A structured methodology for situation appraisal, problem analysis, decision analysis, and preventive action
Key Performance Indicator (KPI): A measurable value indicating progress towards goals; common KPIs include incident response time, resolution rate, and customer satisfaction
Known Error Database (KEDB): A repository of known errors and workarounds for quicker response times in recurring situations
Knowledge Management: The process of documenting, organizing, and sharing information about incidents, known errors, and workarounds
Known Error: A documented problem with a known root cause and workaround, maintained to minimize service disruptions
Occam’s Razor: A principle recommending the simplest explanation with the fewest assumptions as the best solution for problem evaluation
Pareto Principle: Also known as the 80/20 rule, it suggests that 80% of issues stem from 20% of problems, guiding targeted solutions
Priority Levels: Ranking incidents and problems based on urgency and impact to guide resource allocation and response time
Problem: The underlying cause of one or more incidents; problem management seeks to identify and eliminate these root causes
Problem Manager: The individual overseeing problem investigations, documentation, and coordination to resolve issues
Problem Statement: A concise description of an issue using a specific object and deviation; effective problem statements focus teams and improve resolution speed
Problem Task: A defined work unit within problem management, aimed at resolving issues and preventing recurrence; problem tasks can be tracked, prioritized, and assigned
Return on Investment (ROI): Assessing the value gained from problem management efforts relative to their costs
Root Cause Analysis (RCA): A method to uncover the primary cause of a problem, guiding preventive actions
Service Management: A framework managing IT services, integrating incident, problem, change, and knowledge management to enhance service quality
Six Sigma: A data-driven methodology that uses the DMAIC framework (Define, Measure, Analyze, Improve, Control) for continuous improvement
Six Thinking Hats: A brainstorming technique that encourages team members to look at a problem from different perspectives (for example: data, emotion, and optimism)
Subject Matter Expert (SME): An individual with specialized knowledge, often consulted during problem investigations
Taproot: A root cause analysis framework focusing on mapping causes and actions in safety, quality, and accident investigations
Trend Analysis: Analyzing historical data to identify recurring issues, predict future problems, and address risks proactively
Workaround: A temporary solution mitigating an issue’s impact without resolving the root cause