NFV Quality Management Framework Proposal

NFV Quality Management Framework Proposal
Eric Bauer
May 11th 2015
1
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Summary
• ETSI is driving standards for network function virtualization (NFV), including “Network Function
Virtualization Service Quality Metrics” (published in December ’14)
• TM Forum is driving standards for SLA management, including GB917 “SLA Management Handbook,
Volume 2, Concepts and Principles” and TR178 “Enabling End-to-End Cloud SLA Management “
• QuEST Forum drives TL 9000 Measurements Handbook defining objective and quantitative
measurements (e.g., outages) for quality management of telecom networks and equipment
• NFV Strategic Initiative was chartered by QuEST Forum’s Executive Board, and that group is now
working a draft “NFV Quality Management Framework” which enables objective and
quantitative prediction, control and quality improvement of NFV-based services and
applications
- This presentation covers the 4/29/15 review draft of that Framework
2
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
QuEST Forum Executive Board NFV Strategic Initiative Team
3
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
The Network Function Virtualization Vision
Today….
Tomorrow…
a.k.a., cloud-based
applications, or virtualized
network functions (VNFs)
VNF
Service
Provider
Organization
Figure from ETSI NFV Whitepaper
http://portal.etsi.org/nfv/nfv_white_paper.pdf
4
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
NFV
Infrastructure
Service
Provider
Organization
Fundamental Changes due to NFV
1. Decoupling Software from Hardware
2. Shared compute, memory, storage and networking infrastructure
3. Automated Resource and Application Lifecycle Management
4. Automated Network Service Lifecycle Management
5. Dynamic Operation
6. Increasingly Complex Multivendor Environment
Today….
PNF 1
PNF 2
5
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
VNF 1
Tomorrow…
VNF 2
SW
SW
SW
SW
OS
OS
OS
OS
HW
HW
Traditional Network
Function Deployment
NFV
MANO
NFV Infrastructure
Virtual Network
Function Deployment
Objectives of the NFV Quality Measurement Framework
NFV quality measurements should be…
1. Quantitative and Objective
2. Future proof
3. Recognize Different Resource Service Quality Expectations
4. Recognize Different Application and Service Architectures
5. Support Leading Service Quality Indicators
6. Enable Side-by-Side Physical and Virtual Quality Comparisons
7. Principle of Simplicity (Parsimony)
6
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Sample Motivational Scenario
1. One organization offers
VoLTE on NFV
infrastructure shared by
other tenants (e.g., EPC,
IP-TV, enterprise
communications)
2. VoLTE provider buys
best-in-breed VNFs from
several suppliers
3. VoLTE provider and
partners write the policies
and descriptors that
configure and chain VNFs…
5. NFV Infrastructure
service provider’s systems
from other suppliers
automatically apply policies
and descriptors to
configure and operate
VoLTE provider’s (and
other tenants’) service….
4. NFV Infrastructure
service provider buys
COTS servers, storage and
switches from several
equipment suppliers
How does one rapidly localize, drive root cause analysis and
decide on corrective actions in this shared, decoupled, flexible,
dynamic and multi-vendor environment?
7
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Underlay figure from “Network Functions Virtualization (NFV); Architectural Framework,” GS NFV 002 V1.2.1 (2014-12)
End-to-End Quality Management Framework (TR178 Style View)
Primary functionality of
VNF drives TL product
category assignment
Cloud
Service
User
Customer
Role
Application
Service
CSP:
Network
Provider
Provider
Role
8
Provider
Role
Integrator
Role
Network
Service
Virtual Machine and Virtual
Networking Services
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Cloud
Service
Customer
(CSC)
Customer
Role
From Cloud Service
Developers
(a.k.a., VNF Suppliers)
Customer
Role
VNF
Service
Provider
Role
VNF 1
Customer
Role
VNF
Service
Provider
Role
VNF N
Customer
Role
Func.Com p.
as-a-service
Provider
Role
Customer
Role
MANO
Service
Provider
Role
Customer
Role
NFVI
Service
Provider
Role
Automated Lifecycle Management
and Orchestration Services
Cloud Service
Provider (CSP):
NFV Infrastructure,
Management &
Orchestration,
Functional
Component
as-a-Service
Functional Component
(Entity as-a-Service
NFV Quality Management Framework
NFV Quality
Measurement
Cloud
Framework
enables
Service
objective
and
User
quantitative quality
measurements across
Application
Customer
NFV reference
points
Role
Service
to support end-to-end
quality and SLA
management
CSP:
Network
Provider
Provider
Role
9
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Network
Service
Cloud
Service
Customer
(CSC)
Provider
Role
Integrator
Role
Customer
Role
From Cloud Service
Developers
(a.k.a., VNF Suppliers)
Customer
Role
Customer
Role
Customer
Role
Customer
Role
Customer
Role
VNF-1
SLA
VNF-N
SLA
FC aaS
SLA
MANO
SLA
INFRA
SLA
Provider
Role
VNF 1
Provider
Role
VNF N
Provider
Role
Provider
Role
Provider
Role
Cloud Service
Provider (CSP):
NFV Infrastructure,
Management &
Orchestration,
Functional
Component
as-a-Service
Outage Metrics in NFV Quality Measurement Framework
(User) Service Impact
Outages (SO)
Cloud
Service
User
Customer
Role
User Service
SLA
10
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Provider
Role
Integrator
Role
CSP:
Network
Provider
Provider
Role
Cloud
Service
Customer
(CSC)
Network
Service
Customer
Role
From Cloud Service
Developers
(a.k.a., VNF Suppliers)
Customer
Role
Customer
Role
Customer
Role
Customer
Role
Customer
Role
VNF-1
SLA
VNF-N
SLA
FC aaS
SLA
MANO
SLA
INFRA
SLA
Provider
Role
Provider
Role
Provider
Role
Provider
Role
Provider
Role
VNFService
1
Primarily
Impact Outage
(SO)
VNF N
Likely to be Service
Cloud
Service
Impact
Outage
Provider (CSP):
(SO) ([NFV_SQM]
NFV Infrastructure,
TcaaS outage)
Management
&
Orchestration,
Functional
(Primarily)
Network
Component
Element
Impact
as-a-Service
Outages (SONE)
Transaction Metrics in NFV Quality Measurement Framework
Cloud
Service
Customer
(CSC)
Cloud
Service
User
Customer
Role
User Service
SLA
Integrator
Role
CSP:
Network
Provider
Provider
Role
Provider
Role
Network
Service
11
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Customer
Role
From Cloud Service
Developers
Functional-Component(a.k.a., VNF Suppliers)
Customer
Role
Customer
Role
Customer
Role
Customer
Role
Customer
Role
VNF-1
SLA
VNF-N
SLA
FC aaS
SLA
MANO
SLA
INFRA
SLA
Provider
Role
Provider
Role
Provider
Role
Provider
Role
Provider
Role
(Entity) as-a-Service
VNF 1
Reliability
and Service
Latency
VNF NManagement
Lifecycle
Action Reliability and
Latency
Cloud Service
Provider (CSP):
Notification Accuracy
NFV Fault
Infrastructure,
Management
&
(Reliability)
and
Orchestration,
Timeliness
(Latency)
Functional
Component
Virtual Machine and
as-a-Service
Virtual Network Reliability
and Latency
Transaction Quality Model
Input Event
Lifecycle
Management
Request
Processing
Output
Correct
Response
Non-instantaneous
processing action
Failure Event
Latency
Operation Latency is the elapsed time
between the triggering event and the
corresponding correct or incorrect response
Incorrect
Response
Unacceptably
Late Response
Operation
Reliability is the
ratio of correct
responses to the sum
of correct, incorrect,
unacceptably late and
no response events
No
Response
Incorrect Response DPM +
Unacceptably Late DPM +
Non-Response DPM =
Operation Reliability DPM
DPM = defective operations per million requests
12
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Measure Operational Quality (Reliability and Latency) Across Standard NFV
Reference Points
Examples:
Network Service Lifecycle Management (7.1.2)
Network Service Fault Management (7.1.5)
VNF Lifecycle Management (7.2.4)
Virtualized Resources Management (7.3.3)
Virtualized Resources Fault Management
(7.3.5)
VNF Fault Management (7.2.8)
Virtual Machine and Virtual Networking
Reliability and Latency
Section numbers in callouts (e.g., 7.1.2) are from “Network
Functions Virtualization (NFV); Management and Orchestration,”
13
GS NFV-MAN 001 V1.1.1 (2014-12) document
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Proposed Lifecycle Management Errors
[TL_9000] Procedural Error
An error that is the direct result of human
intervention or error.
(Proposed) Lifecycle Management Error
Example of Elevated VNF Service Risk due to
Lifecycle Management Error
An error that is the direct result of policy, management, or orchestration.
Contributing factors can include but are not limited to…
Failing to continuously enforce anti-affinity placement rules for VNFCs
a) deviations from accepted practices or documentation,
can lead to both primary and protecting VNFC instances appearing in a
single NFV infrastructure failure group
b) inadequate training,
not generally applicable
c) unclear, incorrect, or out-of-date
faulty or out of date: automation scripts;
Proper execution of faulty or out of date scripts can produce faulty and
documentation,
service, VNF or resource descriptors; etc
higher risk (e.g., simplex) VNF configurations
d) inadequate or unclear displays, messages, inadequate, insufficient or stale FCAPS
Inadequate, insufficient or stale performance information can produce
or signals,
input data
faulty elastic capacity management decisions
e) inadequate or unclear hardware labeling,
not generally applicable
f) miscommunication,
not generally applicable
Configuring non-standard third party software to monitor, manage,
g) non-standard configurations,
backup or control a VNF instance.
Failing to diligently monitor alarms and correct unsuccessful VNF repair
h) insufficient supervision or control
actions can leave impacted VNF simplex exposed
i) user characteristics such as mental
faulty execution of policy by a management Faulty execution of automation scripts can produce faulty and higher
attention, physical health, physical fatigue,
or orchestration element
risk (e.g., simplex) VNF configurations
mental health, and substance abuse.
tardy execution of lifecycle management
Insufficient automated management and orchestration capacity or other
action
causes result in late execution of VNF repair, capacity change or other
lifecycle management action, thereby prolonging service risk to target
VNF.
risky operational policies
Failing to maintain sufficient spare application capacity online can yield
poor user service quality when unforecast surges in offered workload
occur during capacity change lead time intervals
14
Standardizing definition of lifecycle management errors enables richer conversations
about roles, responsibilities and accountabilities…before outages occur
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Outlook
• 4/29/15 draft of NFV Quality Management Framework will be reviewed on 5/21/15 at QuEST
Forum meeting; goal is to baseline this non-normative document 3Q15
• ETSI NFV work item in support of NFV Quality Management Framework will be considered later this
month
• QuEST Forum will continue to work with TM Forum, ETSI NFV, NIST and appropriate other SDOs to
enable standardized, objective and quantitative metrics to facilitate rapid and accurate fault
localization, root cause analysis, and end-to-end SLA management
15
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Questions?
Eric.Bauer@alcatel-lucent.com
16
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
17
COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.