Network Mgmt |
Ethernet Switch Reliability
I functioned as an ITIL Problem Manager for a couple years, both managing the process and leading Root Cause Analysis efforts.
The Problem Managment Process I managed, along with the flow which a typical problem traverses, how we prioritize, and an example Problem Record from the tool (Redmine) we use to track Problems, what we do during review meetings, and how we report monthly and quarterly (more examples) to management.
Templates for reporting Problems upstream
I view Problem Management as an IT-specific instance of Risk Management and view its theroetical underpinnings in ways consonant with the following:
My interest in the larger Enterprise Risk Management space:
My favorite methodology, with backing checklist, for managing an RCA comes from Advance7 and is described in detail in their Rapid Problem Resolution book and in various white papers. We categorize RCA efforts into flavors, depending on three influences: reproducibility, staff resources, and tool fit.
I facilitate a hands-on workshop in which participants split into small groups and practice a simplified version of the RPR Methodology along with analysis skills, working through real-world RCAs. See the Seminars page on this site.
What Takes Us Down?, published in the October 2012 ;login. My analysis of this data set suggests that timely Patching and proactive Testing can convert Unplanned incidents into Planned events, although I admit that the argument isn't compelling.
Summary of the data set:
I've charted statistics extracted from the database in several ways, none of which tell a persuasive story to me. Note that the database starts October 2010 and ends June 2012.
In 2016, we experienced a range of Problems with our IDF Ethernet Switch inventory. In these documents, I report upward to management on how our experience matches industry-norms, including as I go education around how this stuff works.
|Last modified: 2017-09-30|