Predictive Analytics with Oracle Data Mining Vinay Deshmukh Bryan Hodge

Predictive Analytics with Oracle
Data Mining
Vinay Deshmukh
Senior Director
Oracle Applications Labs
vinay.deshmukh@oracle.com
Bryan Hodge
Global Leader Customer Intelligence
Customer Support Services
bryan.hodge@oracle.com
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
2
We run the applications that run Oracle
We drive enhancements based on our experience
We share best practices with our customers
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Value Chain Opportunities and Risks
Large and Diverse Customer
Base
Opportunity & Risk
Assessment
Transition to the cloud
Complex Global Hardware
Value Chain
600+ global spares warehouses
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
4
Opportunity and Risk assessment using ODM
Discover hidden/subtle data
patterns
Augment Value Chain
Planning –both forward and
reverse
Identify inter-relationships
among data elements
Oracle Data
Mining
Quantify likelihood of
opportunity/risk
Rewind the clock and
compare model accuracy
against actuals.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
5
Oracle Advanced Analytics Evolution
• New algorithms (EM,
PCA, SVD
• SQLDEV/Oracle Data
Miner 4.0 “work flow”
GUI launched with SQL
script generation and
• ODM 11g & 11gR2 adds
SQL Query node (R
AutoDataPrep (ADP), text integration)
mining, perf. improvements
• OAA/ORE 1.3 + 1.4
• SQLDEV/Oracle Data Miner launched adding
• Oracle Data Mining
3.2 “work flow” GUI
several new scalable R
10g & 10gR2
launched
algorithms
introduces SQL dm • Integration with “R” and • Oracle Adv. Analytics
• Oracle Data Mining functions, 7 new SQL introduction/addition of
for Hadoop Connector
• Oracle acquires
dm
algorithms
and
9.2i launched – 2
Oracle R Enterprise
launched with scalable
Thinking Machine
new Oracle Data
algorithms (NB
• Product renamed “Oracle BDA algorithms
Corp’s dev. team +
Miner
“Classic”
and AR) via Java
• 7 Data Mining “Darwin” data
Advanced Analytics (ODM +
wizards driven GUI
API
“Partners”
ORE)
mining software
Analytical SQL in the Database
1998
1999
2002
2004
2005
2008
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
2011
2015
Oracle Advanced Analytics
Performance and Scalability with Low Total Cost of Ownership
Traditional Analytics
Oracle Advanced Analytics
Data Import
Data remains in the Database
 Scalable, parallel Data Mining algorithms in SQL kernel
Data Mining
Model “Scoring”
 Fast parallelized native SQL data mining functions, SQL data preparation and efficient
execution of R open-source packages
Data Prep. &
Transformation
 High-performance parallel scoring of SQL data mining functions and R open-source
models
avings
Data Mining
Model Building
Fastest way to deliver enterprise-wide predictive analytics
Data Prep &
Transformation
 Database scoring engine
Data Extraction
 Integrated GUI for Predictive Analytics
Lowest TCO
Model “Scoring”
Embedded Data Prep
Model Building
Data Preparation
Hours, Days or Weeks
Secs, Mins or Hours
 Eliminate data duplication
 Eliminate separate analytical servers
 Leverage investment in Oracle IT
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Miner
SQL Developer 4.0 Extension
Free OTN Download
• Easy to Use
– Oracle Data Miner GUI for data analysts
– “Work flow” paradigm
• Powerful
– Multiple algorithms & data transformations
– Runs 100% in-DB
– Build, evaluate and apply models
• Automate and Deploy
– Save and share analytical workflows
– Generate SQL scripts for deployment
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
More Data Variety—Better Predictive Models
100%
• Increasing sources of
relevant data can boost
model accuracy
Naïve Guess or
Random
Responders
Model with “Big Data” and
hundreds -- thousands of input
variables including:
• Demographic data
• Purchase POS transactional
data
• “Unstructured data”, text &
comments
• Spatial location data
• Long term vs. recent historical
behavior
• Web visits
• Sensor data
• etc.
100%
Model with 20 variables
Model with 75 variables
Model with 250 variables
0%
Population Size
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics Algorithms
Function
Algorithms
Applicability
Classification
Logistic Regression (GLM)
Decision Trees
Naïve Bayes
Support Vector Machines (SVM)
Classical statistical technique
Popular / Rules / transparency
Embedded app
Wide / narrow data / text
Regression
Linear Regression (GLM)
Support Vector Machine (SVM)
Classical statistical technique
Wide / narrow data / text
Anomaly
Detection
One Class SVM
Unknown fraud cases or anomalies
Attribute
Importance
Minimum Description Length (MDL)
Principal Components Analysis (PCA)
Attribute reduction, Reduce data noise
Association
Rules
Apriori
Market basket analysis / Next Best Offer
Clustering
Hierarchical k-Means
Hierarchical O-Cluster
Expectation-Maximization Clustering (EM)
Product grouping / Text mining
Gene and protein analysis
Feature
Extraction
Nonnegative Matrix Factorization (NMF)
Singular Value Decomposition (SVD)
Text analysis / Feature reduction
A1 A2 A3 A4 A5 A6 A7
F1 F2 F3 F4
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
In-Database Advanced Analytics
Independent Samples T-Test
• Query compares the mean of AMOUNT_SOLD between
MEN and WOMEN Grouped By CUST_INCOME_LEVEL ranges
• Returns observed t value and its related two-sided significance (<.05 = significant)
SELECT substr(cust_income_level,1,22) income_level,
avg(decode(cust_gender,'M',amount_sold,null)) sold_to_men,
avg(decode(cust_gender,'F',amount_sold,null))
sold_to_women,
stats_t_test_indep(cust_gender, amount_sold,
'STATISTIC','F') t_observed,
stats_t_test_indep(cust_gender, amount_sold)
two_sided_p_value
FROM sh.customers c, sh.sales s
WHERE c.cust_id=s.cust_id
GROUP BY rollup(cust_income_level)
ORDER BY 1;
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Case Study:
Support Cancellation Early Warning
Bryan Hodge
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
12
Case Study: Support Cancellation Early Warning
$21B
Premier Support Revenue
550K
Contracts to be Renewed
8M Product Lines
Very diverse customer base
Broad range of products
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
13
Challenge:
Predict the small percentage of contracts/lines
that are at risk in order to focus resources
appropriately , and minimize losses
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
14
Business Solution
• Developed a cancellation early
warning system
• Embedded system generated risk
assessment into Forecasting Tool
• Sales Rep uses to help forecast &
engage management
• Manager uses to inform forecast
judgement
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
15
Two Phase Approach
Tribal Knowledge
Oracle Data Miner
• Used sales rep experience to
identify risk attributes
• Analyzed one year of outcomes to
Train decision tree model
– E.g. Age of product, size of deal
• Profiled contract base
• Established thresholds for Low,
Medium & High risk per attribute
– Cancelled or Renewed
• ‘Wound the clock back’ on six
months more data
• Scored the six months data to
generate predictions
• Algorithm to balance attributes into
overall risk assessment
• Assessed Accuracy at 85%
= (True Positive + True Negatives ) /
Number of Observations
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
16
Oracle Data Miner - Details
Warehouse star schema with
enhanced attributes
Attribute Importance Analysis
ODM Analysis
Trained decision tree model &
assessed accuracy
Saved results in warehouse fact
for use in Forecasting Tool
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
17
Business Benefits
Challenges
• Identified hidden relationships
across many attributes
• Ensuring statistically significant
data volumes in tree branches
• Improved quality of risk assessment
• Preparation of data to ‘wind the
clock back’
• Early intervention for customer sat
• Reduced cancellation rates
– Bottom line improvements
• Avoiding bias during data prep.
• Handling partially populated
attributes
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
18
Case Study:
Predicting Spare Parts@risk
Vinay Deshmukh vinay.deshmukh@oracle.com
Senior Director
Oracle Applications Lab
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
19
Case Study: Identify Spare parts @risk of short supply
Global Spares Warehouses
$2.3B
Hardware Service Revenue
600+
Large deployment of Value
Chain Planning . Augment
VCP capabilities with Oracle
Data Mining
Broad range of products
Very diverse customer base
1.5 million part-location pairs
supersessions & substitutions
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
20
Problem Statement
Ensure a high level of service to our hardware customers by
identifying the parts at risk of short supply at the
warehouses closest to them and take proactive steps to
remedy the shortage risk . Augment current Value Chain
Planning Capabilities to provide risk assessment of
parts@risk.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Value Chain Planning Solution
Transformational Tools
Deployed
Service
Parts
Planning
Deployed
Deployed
Global Order
Promising
Deployed
Production
Scheduling
Deployed
Demand Signal
Management
Planning
Analytics
Network Design
and
Risk Management
Sales and
Operations
Planning
Collaborative
Planning and VMI
Trade Promotion
Planning and
Optimization
Supply and Distribution
Planning and Event-driven
Simulation
Deployed
Demand
Management and
Advanced
Forecasting
• Single source of truth
• Integrated with ERP
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Deployed
Solution Approach
1. Augment Value Chain Planning using ODM Model
2.Exception Reporting
3.Customer Report
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Model Attributes
Supply
Demand
Forecast Accuracy
• On hand
• Safety stock mean/std dev
• MAPE
• External repair orders
• Forecast mean/std dev
• Volatility
• Projected available balance
• Shipments
• Intermittency
• Days of supply
• Backorders
Item Attributes
• Cost
• PLC Code
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
24
Prototype Assumptions for the Model: 1 of 2
• 4 Models based on – AMER (US), AMER (Non US), EMEA and APAC
• Data used for training the model was Feb , Mar and Apr 2014 with May 2014 as the target
• Input data used in the model - Apr, May, Jun 2014 with Jul 2014 as the target
• Average and Std Dev used for time phased inputs to the Model – Forecast and Safety
Stock. Latest value for projected available balance used.
• Current Backorders and Onhand considered
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Prototype Assumptions for the Model: 2 of 2
• For remaining parameters, 3 month average value used
• Item is marked at RISK if
•
(backorder > 0 OR pab_qty < 0 OR ss_qty > oh_qty)
• Item is marked as ‘Not at RISK’ if
•
(pab_qty between 0 and 0.25 )
OR (oh_qty - ss_qty) between 0 and 0.25
•
Remaining records were deleted. This tolerance logic was applied to restrict the count
of ‘NO’ records in the training data
• The final output shows the items at risk along with the orgs where planning exceptions
are generated in the latest run of the corresponding Value Chain Plan for spares
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Accuracy Analysis
EMEA
Total Cases: 2097
False YES: 0 (0%)
False NO: 204 (10%)
Accuracy: 90%
AMER
Total Cases: 2097
False YES: 1 (0%)
False NO: 433 (21%)
Accuracy: 79%
APAC
Total Cases: 1902
False YES: 0 (0%)
False NO: 127 (7%)
Accuracy: 93%
Latin America
Total Cases: 2358
False YES: 1 (0%)
False NO: 443 (19%)
Accuracy: 81%
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
27
Exception Reporting
•
OBIEE Report to show priority exceptions generated for the parts-at-risk predicted
by the ODM Model (built on Oracle Advanced Planning Command Center , Oracle
Value Chain Planning Suite)
•
ODM Output stored in Value Chain Planning data model by specifying the region
and organization
•
VCP (Advanced Planning Command Center) reports latest exceptions for the
respective plan for the parts-at-risk predicted
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Detailed Solution – APCC Report
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Customer Report
•
This report shows the contract, base-model and party impacted by the parts-at-risk
predicted by ODM Model
•
Based on the part-at-risk, the model is fetched utilizing Demantra Data.
•
'EXPIRED', 'CANCELLED', 'TERMINATED‘ contracts are filtered out
•
Premier Customers impacted by the parts @ risk are identified
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Case Study:
Predicting Hardware Opportunities
Vinay Deshmukh vinay.deshmukh@oracle.com
Senior Director
Oracle Applications Lab
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
31
Problem Statement
• Predict the outcome of (non-Cloud) Hardware Opportunities
whose expected revenue is greater than $1 million
–Includes both Direct and Indirect sales channel
• For the opportunities predicted to be won , provide early
visibility to suppliers and contract manufacturers by
leveraging the capabilities of Value Chain Planning Suite .
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Solution Approach
• Train the ODM model with the historical data
– Opportunities from 31-Mar-2011 to 31-Aug-2013
– Trained the model with opportunities that were Won or Lost between 31-Mar-2011 and
31-Aug-2013
– Additional computed attributes used - product weight, customer weight, partner weight
• Predict the likely outcome of opportunities open as of 1-Sep-2013
using the model
• Test the prediction by comparing against actual wins and losses for
predicted opportunities
• Future: Use the predicted opportunities in Value Chain Planning as
causal factors to improve forecast accuracy
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Model Attributes
Customer
Product
• Account Weight
• Product Line
• Annual Revenue Category
• Primary Competitor
• Number of Employees
• Product Group
Industry
• Top level Industry
• Product Weight
Geography
Partner
Opportunity
• LOB Code
• Channel Type
• Cycle Time
• Region
• Partner Type
• Opportunity Status
• Country
• Partner Weight
• Expected Revenue
• Sales Method
• Opened Date
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
34
Calculating weights using Bayesian approach
• ((Customer 'x' Won Opty / Total Won Opty) * (Customer 'x' Total Opty / Total Opty)) /
(Σ(Customer 'x' Won Opty / Total Won Opty) * (Customer 'x' Total Opty / Total Opty))
• ((Product 'y' Won Opty / Total Won Opty) * (Product 'y' Total Opty / Total Opty) ) /
(Σ(Product 'y' Won Opty / Total Won Opty) * (Product 'y' Total Opty / Total Opty))
• ((Partner 'z' Won Opty / Total Won Opty) * (Partner 'z' Total Opty / Total Opty)) /
(Σ(Partner 'z' Won Opty / Total Won Opty) * (Partner 'z' Total Opty / Total Opty))
Where x = number of customers, y = Number of products, z = Number of partners
• Note: Direct and Indirect Partner weights are calculated separately. For customer weights = 0 they
are replaced by the median of the customer weights
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The Metrics
• False Positives
Model predicted that an event will occur but the event did not occur over the risk horizon
• False Negatives
Model predicted than an event will not occur over the risk horizon but the event did occur
Model accuracy = 1 - [(false positives + false negatives)/ total observations]
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Accuracy Achieved (Direct + Indirect channels)
Actual
• Average Accuracy
– 73.0 %
Lost
Won
Total
Correct %
• Overall Accuracy
– 78.2 %
Lost
1813
1114
2927
61.9406
• Accuracy of winning the deal
– 84.0 %
Won
1309
6882
8191
84.0191
Total
3122
7996
11118
Predicted
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Conclusion
Oracle Applications Lab
Predictive Analytics with Oracle Data Mining
Challenge
Solution
• Predict Contract Lines@risk
• Predict Spare Parts@risk
• Predict H/W opportunity Wins
• Oracle Data Mining for predictive analytics
• Augment Oracle Value Chain Planning capabilities provided by Oracle
Demantra and Oracle Advanced Planning Command Center
• OBIEE
Benefits
• Reduced Cancellation Rates
• Improve Service Delivery Performance to hardware spares customers
• Early demand visibility to suppliers
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
38
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
40