Definitions


Agreed Service Time

The agreed hours when the service is to be available.

Example(s):
  • 24x7x365
  • Business Hours (M-F, 7am ET to 7 pm ET, except holidays)
  • Supplier / Customer defined


Asset Management

A systematic approach to the governance and realization of value from the things that a group or entity is responsible for, over their whole life cycles. It may apply both to tangible assets (physical objects such as buildings or equipment) and to intangible assets (such as human capital, intellectual property, goodwill or financial assets).

Additional Info:


Availability

Ability of a Configuration Item or IT Service to perform its agreed Function when required. Availability is usually calculated as a percentage. This calculation is often based on Agreed Service Time and Downtime.

Availability % Downtime per year Downtime per month Downtime per week Downtime per day
90% ("one nine") 36.5 days 72 hours 16.8 hours 2.4 hours
99% ("two nines") 3.65 days 7.20 hours 1.68 hours 14.4 minutes
99.9% ("three nines") 8.76 hours 43.8 minutes 10.1 minutes 1.44 minutes
99.99% ("four nines") 52.56 minutes 4.38 minutes 1.01 minutes 8.64 seconds
99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds 864.3 milliseconds


Break fix

Administrators sometimes fix applications to a temporary state until they can devise a better solution or wait to perform a change during the maintenance window.


Change Management

A process to ensure that standardized methods and procedures are used for efficient and prompt handling of all changes, in order to minimize the impact of change related Incidents upon service quality, and consequently to improve the day to day operations of the organization.


Configuration Management

A systems engineering process for establishing and maintaining consistency of a product's performance, functional, and physical attributes with its requirements, design, and operational information throughout its life.

Additional Info:


Configuration Item (CI)

Any Component that needs to be managed in order to deliver a Service. Information about each CI is recorded in a Configuration Record within the Configuration Management System and is maintained throughout its Lifecycle by Configuration Management. CIs are under the control of Change Management. CIs typically include IT Services, hardware, software, buildings, people and formal documentation such as Process documentation and SLAs.


Customer Experience (CX)

Customer experience is the product of an interaction between an organization and a customer over the duration of their relationship. This interaction is made up of three parts: the customer journey, the touchpoints the customer interacts with, and the environments the customer experiences (including digital environment) during their experience. A good customer experience means that the individual's experience during all points of contact matches the individual's expectations.


Downtime

The time when a Configuration Item or service is not available during its Agreed Service Time. The Availability of a service is often calculated from Agreed Service Time and Downtime.


Escalation Model

The process that describes who, when, and how to escalate to others. This may include under which conditions.
escalation example
(Image blurred intentionally)


Event Management

Event Management, as defined by ITIL, is the process that monitors all events that occur through the IT infrastructure. It allows for normal operation and also detects and escalates exception conditions.

An event can be defined as any detectable or discernible occurrence that has significance for the management of the IT Infrastructure or the delivery of IT service and evaluation of the impact a deviation might cause to the services. Events are typically notifications created by an IT service, Configuration Item (CI) or monitoring tool.

Additional Info:
Examples:
  • Security or network intrusions
    • failed login attempts
    • traffic from an unexpected ip
    • dos attacks
    • database access
    • edits to critical system files
      • syslog
      • /sbin/*
      • /boot
      • /etc/passwd
      • /etc/groups
  • Configuration changes
    • OS files
      • /etc/hosts
      • /etc/resolv.conf
    • OS commands
      • ifconfig
      • server restarts
    • Configuration changes to product, service or application including addition of gear
      • Edits to application configuration file
      • application restarts
      • tnsnames.ors
      • httpd.conf
      • Tomcat config file
  • Abnormalities or failures of key configuration items
    • NFS server connectivity
    • Partition usage
    • Memory usage
    • Swap usage
    • SNMP connectivity
    • ICMP connectivity
    • CPU utilization
    • network and port connectivity
    • I/O thresholds
    • Dependency availability
    • All crit , alert, emerg, and panic syslog events
    • Loss of redundacy
  • KPIs – For every KPI failure, there should be an accompanying event from another category such as abnormalities or failures of key configuration items.
    • Reliability
    • Availability
    • Time Between Failures
    • Response Time
    • VIP failures
    • Any other measure of success that defined for the Product, service or application.
  • Process effectiveness
    • Too many / few processes
    • Process taking too long
    • Process returns errors
    • Capacity
    • Software licensing usage


Impact

Impact is often based on how Service Levels will be affected. Impact and Urgency are used to assign priority.

Types:
  • High: Business Unit, floor, branch, LOB, or multiple VIPs
  • Medium: small group of users or a single VIP
  • Low: single users


Incident Management

A process to restore normal service operation within Service Level Agreement limits as quickly as possible and minimize the adverse impact of the Incident on business operations, thus ensuring that the best possible levels of service quality and availability are maintained.


ITIL

An acronym for Information Technology Infrastructure Library, is a set of detailed practices for IT service management (ITSM) that focuses on aligning IT services with the needs of business.


Intermediate Distribution Frame (IDF)

Intermediate Distribution Frame is a wiring rack located between the MDF (main distribution frame) and the intended end user devices (telephones, routers, PCs, etc.). Cables run from the outside world to the MDF and then to the IDFs.


Main Distribution Frame (MDF)

Main Distribution Frame is a wiring rack that connects outside lines with internal lines. It is used to connect public or private lines coming into the building to internal networks. In a telco central office (CO), the MDF is generally in close proximity to the telephone switch.


Mean time between failures (MTBF)

A Metric for measuring and reporting Reliability. MTBF is the average time that a Configuration Item or service can perform its agreed Function without interruption. This is measured from when the CI or service starts working, until it next fails.


Mean Time between System / Service Incidents (MTBSI)

The mean elapsed time between the occurrence of one system or service failure and the next.


Mean Time to Restore Service (MTRS)

The average time taken to restore a Configuration Item or service after a Failure. MTRS is measured from when the CI or service fails until it is fully Restored and delivering its normal functionality.


Mean Time to Repair (MTTR)

The average time taken to repair a Configuration Item or service after a Failure. MTTR is measured from when the CI or service fails until it is repaired. MTTR does not include the time required to Recover or Restore.


Network Diagram

A graphical representation of current state of the network. Usually comes in two flavors: physical and logical.
network diagram example
(Image blurred intentionally)


Operating Level Agreement (OLA)

An Agreement between a service provider and another part of the same organization. An OLA supports the service provider's delivery of services to customers. The OLA defines the goods or services to be provided and the responsibilities of both parties.


Priority

The value given to an Incident, Problem or Change to indicate its relative importance in order to ensure the appropriate allocation of resources and to determine the timeframe within which action is required. Priority is based upon a coherent and up-to-date understanding of business impact, urgency, and sometimes technical severity.


Problem Management

A process to minimize the adverse impact of incidents and problems on the business that are caused by errors within the IT infrastructure, and to prevent recurrence of incidents related to these errors.

Additional Info:


Reliability

A measure of how long a Configuration Item or service can perform its agreed Function without interruption. Usually measured as MTBF or MTBSI. The term Reliability can also be used to state how likely it is that a Process, Function, etc., will deliver its required outputs.


Service Level Agreement (SLA)

An Agreement between a service provider and a customer. The SLA describes the service, documents Service Level Targets, and specifies the responsibilities of the service provider and the customer. A single SLA may cover multiple services or multiple customers.


SLA Calculator

SLA is based on Severity unless it is explicitly written in the contract. SLAs are written in minutes. biz represents business minutes.

  Severity
  Sev 1 Sev 2 Sev 3 Sev 4 Sev 5
Customer Severity Levels 3 30 120 480 (biz) N/A N/A
4 30 120 120 480 (biz) N/A
5 30 120 120 480 (biz) 480 (biz)


Service Level Management

Service Level Management (SLM) aims to negotiate Service Level Agreements with the customers and to design services in accordance with the agreed service level targets. This ITIL process is also responsible for ensuring that all Operational Level Agreements and Underpinning Contracts are appropriate, and to monitor and report on service levels.

Additional Info:


Severity Code

A simple code assigned to Incidents, Problems and Changes, indicating their underlying complexity and their impact on resources. Used in conjunction with Business Impact and Business Urgency, it is one of the factors for allocating priorities.

Severity Description Examples
1 A critical incident with very high impact A customer-facing service, like DNS, is down for all customers
Confidentiality or privacy is breached
Customer data loss
2 A major incident with significant impact A customer-facing service is unavailable for a subset of customers
Core functionality (e.g. git push, issue create) is significantly impacted
3 A minor incident with low impact A minor inconvenience to customers, workaround available
Usable performance degradation


Severity Code Calculator

Severity is based on priority except where the customer or contract uses a different number of severity levels.

  Priority
  Critical High Medium Low Normal
Customer Priority Levels 3 30 120 480 (biz) N/A N/A
4 30 120 120 480 (biz) N/A
5 30 120 120 480 (biz) 480 (biz)


Urgency

A measure of business criticality of an Incident, Problem or Change where there is an effect upon business deadlines. The urgency reflects the time available for repair or avoidance before the impact is felt by the business. Together with impact, and perhaps technical severity, it is the major means of assigning priority for dealing with Incidents, Problems or Changes.

Types:
  • High: an activity which has a direct financial, brand or security impact on the business organization.
  • Medium: an activity which directly supports the execution of a business service.
  • Low: an activity that does not directly support a business service and is not time sensitive.
 
Donate Donate An illustration of a heart shape                            1999 - 2021 paultclark.com