Matt Rey, MBCP, MBCI Resiliency Best Practices InfraGard Meeting – May 2013

Resiliency Best Practices
InfraGard Meeting – May 2013
Matt Rey, MBCP, MBCI
matthew.rey@ebay.com
May 2013
Goals
• Review eBay Inc. Resiliency’s approach to BCM mixed-in with some best practices
• Discussion
InfraGard – May 2013
2
A Family of Brands Enabling Commerce
InfraGard – May 2013
3
eBay Inc. Resiliency Mission
Mission Statement
The mission of the eBay Enterprise Resiliency Team is to provide a holistic management process
by which businesses plan for continuing critical operations in order to:
• Ensure the availability of eBay.com and adjacent business unit sites (e.g., PayPal.com, mobile.de)
• Identify potential threats or likelihood of impact to operations and plan / respond accordingly
• Enable an effective response that safeguards the interests of our key stakeholders, reputation and value
creating activities
• Enable executives to manage operations under adverse conditions via appropriate resilience strategies,
recovery objectives, and operational risk management considerations
• Foster a culture of resiliency and preparedness throughout the enterprise
InfraGard – May 2013
4
BCM Program Alignment
eBay’s Shared Behaviors
•
•
•
•
•
Be the Customer
Simplify & Clarify
Debate, Decide & Deliver
Be Open, Honest & Direct
Do the Right Thing
Industry Standards & Best Practices
• Disaster Recovery Institute Int’l (DRII) – Professional Practices
• Business Continuity Institute (BCI) – Good Practices Guidelines (GPG)
• Int’l Organization for Standardization (ISO) – ISO 22301 Business Continuity Management Systems
Identification of Regulatory Guidelines & Requirements
• Domestic regulatory standards and requirements (e.g. FFIEC, SEC)
• Regulatory requirements for businesses wherever there is a presence
– Depends on the type of business activities
– Can sometimes extend to other aspects/locations of the organization
InfraGard – May 2013
5
BCM Program Organizational Structure
eBay Inc. Resiliency
Dedicated Team
~20
-
Program Governance,
Oversight, and Maturity
Business Engagement &
Program Implementation
Support
Training & Awareness
Toolset Management
Management Reporting
Business Unit Roles
Business Executive Sponsor(s)
…
Crisis Management Support Roles
Business Continuity Coordinator
Disaster Recovery Coordinator
Business Continuity Planners
Disaster Recovery Planners
…
Shared &
Dedicated Roles
-
Program Implementation
Plan Documentation
Strategy Development
Plan Testing
Training & Awareness
InfraGard – May 2013
6
Dedicated Team Structure
eBay Inc.
Resiliency
PMO,
Training
Safety &
Security
Crisis
Management
Business
Continuity
[IT] Disaster
Recovery
Program
Tools
Training &
Awareness
Testing
InfraGard – May 2013
7
Business Role Structure
To support the varying size and complexity of the business units within scope, a structure specific to the
BCM program has been developed to organize planning efforts and management reporting.
Business Unit / Vertical Executive Sponsor
eBay
Product Development
Business Unit / Vertical Coordinators
Business Area / Brand Executive Sponsor
Business Area / Brand Coordinators
Department / Team Planners
Mobile App Dev. ‐ QA
InfraGard – May 2013
8
Planning Lifecycle
The initial engagement of business units follows a general flow that all begins with identifying and training
program coordinator roles. Once the roles have been identified and training has been provided, a timeline
for program implementation and planning activities can be developed and further reviewed by the various
stakeholders for approval.
•Identify BC Planning roles
Role Identification & •Conduct training sessions
•Finalize planning timeline
Training
•Identification of Potential Impacts
Risk Assessment •Threat Ranking
& Business •Business Unit Criticality Ranking
Impact Analysis
•Dependency (e.g. Technology) Criticality Ranking
Business Continuity Planning
Gap Analysis & Mitigation Planning
•Department/Team‐Level Planning
•Recovery Resource Requirements
•Recovery Strategies and Procedures
Plan Testing
•Plan Testing Along Maturity Model
•Action Item Tracking & Follow‐up
•BC Plan & Strategy Improvement
InfraGard – May 2013
9
Business Recovery Strategies
Documenting a Business Continuity Plan (BCP) is great, but is the strategy viable?
Existing Capabilities
Recovery Strategy
Where’s the Beef!
Conducting a gap analysis to identify and compare existing capabilities to what is actually required to
leverage the identified recovery strategies. This stage in the process usually identifies larger issues and
projects that the business unit must undertake in order to enable their departments and functions to
continue/recover critical activities. Examples include:
• Enabling team members with remote working capabilities
• Ensuring technology capabilities meet business needs (e.g. availability, data backup)
• Pre-identification of recovery locations for critical areas such as Customer Service and distribution centers
• Pre-coordination of process-diversion strategies for teams with a presence in multiple locations
InfraGard – May 2013
10
Planning Lifecycle
InfraGard – May 2013
11
Crisis Management
To support the varying size and complexity of the ebay locations around the world, a layered crisis
management structure has been developed to organize response efforts based on the needs of the event.
Corporate & Executive Team
Regional Teams
Regional Business Unit Executive Management, Senior Regional Corporate Functions (Security, Facilities, etc.)
Site‐based Teams
EVENT ESCALATION
Corporate & Executive Management, Senior Corporate Functions (Security, Facilities, etc.)
Local Business Unit Management, Local Site Support (Security, Facilities, etc.)
InfraGard – May 2013
12
Technology Disaster Recovery
Website Disaster Recovery Best Practices
•
•
•
•
Geographically diverse redundancy planning based on website criticality and revenue
Defined escalation and support structures
Recovery plan documentation (both system- and data center-based)
Testing along maturity path
Non-website Disaster Recovery Best Practices
•
•
•
•
•
•
Standards and policies to meet business needs
Accountability on technology owners and business units
Alignment of system and application recovery capabilities with business unit requirements (BIA/BCP)
System dependency analysis
Recovery plan documentation
Testing along maturity path
Participation
-
Technology management
Monitoring and support teams
Subject Matter Experts (SMEs)
Vendors (as determined by organizational dependencies)
Alternate Work Force (AWF)
InfraGard – May 2013
13
Testing
Policy & Standards
•
•
•
•
Define a testing policy that meets the needs of the organization.
Define the testing frequency taking into consideration plan types, plan criticality, and exercise maturity.
Ensure [all] policies and standards are reviewed and approved by the appropriate corporate officers.
Ensure [all] policies and standards are available to the organization via multiple channels.
Testing Goals & Best Practices
•
•
•
•
All critical plans should be tested annually according to a testing maturity path.
All key staff members should participate to become familiar with plans and strategies.
Validation of plans and strategies.
Appropriate documentation of issues during testing, and follow-up thereafter.
Participation
- Teams and key staff members identified in the targeted plans
- Support roles as required (technology, security, facilities, etc.)
- Local authorities
InfraGard – May 2013
14
Training & Awareness
Primary Goals
• Engage staff at all levels of the organization in the BCM program.
• Foster a culture of resiliency throughout the organization.
• Train coordinators and other key roles throughout the program.
Example Approaches
• Training
- in-person meetings
- campus-wide sessions
- online training sessions
- eLearning modules
- Industry training opportunities
• Awareness Campaigns
- BCM Awareness Week
- Ad-hoc campaigns to promote the program or specific components
- Desk-drops, posters, flyers
- Intranet site with training content
- Validation exercises
• Testing
- Testing is the best training!
- Tabletops, semi-functional exercises, functional exercises, etc.
InfraGard – May 2013
15
BCM Program Toolset
The right tools to enable the organization, to enable success.
Core Program Tools
• Notification Tool – Enabling communication with staff with check-in capabilities.
• Planning Tool – A flexible planning tool to capture business plans and other documentation.
• Crisis Management Tool – Crisis/Incident management software enabling team members and coordinators to collaborate
during an incident (e.g. mapping, chat, document sharing, team status, logging). Sometimes combined with Planning tool.
• Data Feeds – Fueling the program toolset, data feeds are critical to supplying the foundational data used for planning and
analysis within the program (e.g. facility information, people information, system details).
• Coordinator Email DLs – Enabling quick communication with business coordinators.
• Satellite Phones – Distributed to executives and key roles to enable communication.
• Travel Registry – Logging of business travel to assist in the identification of potentially impacted staff.
Extending the Footprint of the Organization
•
•
•
•
Remote Access – Enabling staff to work from anywhere in the world with a working internet connection.
Collaboration Tools – Audio and video conferencing to enable virtual collaboration during events.
Instant Messaging – Instant messaging to enable communication.
Unified Communications – Extending presence and availability to external devices and networks (e.g. telephony solutions,
mobile apps).
Outside Programs Worth Consideration
• GETS/WPS – Distributed to executives and key roles to enable communication.
• CEAS – Enabling access to restricted areas during events for certain metropolitan areas (e.g. NY, MA).
InfraGard – May 2013
16
Incident Response (High Level)
Situational Awareness (immediate)
• Data gathering on potentially impacted locations and staff using program tools.
• Continuous event monitoring for updates on details, arrests, lock-downs, etc.
• Create log crisis management tool to begin crisis management & collaboration activities.
Establishing Communication with Staff
• Confirming the well-being of all staff based in the area as well as travelers in the area.
• Identifying where local (and travelling) staff are located during and after the event – providing shelter-in-place as required.
• Offering assistance to anyone impacted by the event (travel needs, counseling, etc.).
Establishing Communication with Management & Stakeholders
• Communicating with local management to further identify any employee impact, as well as the immediate plan of action for
office operations.
• Communicating with corporate management to provide status on the well-being of staff and status of office operations.
• Determining a schedule for further communications and meetings to coordinate response activities.
Situational Awareness (longer term)
• Communication with local management to understand any changes in impacts to staff and business operations.
• Ensuring critical functions have strategies to continue workload while office is closed (as necessary).
InfraGard – May 2013
17
Thank You!
Questions?
InfraGard – May 2013
18