How to perform predictive analysis ... web analytics tool data FREE Webinar by June 19

A GACP and GTMCP company
How to perform predictive analysis on your
web analytics tool data
June 19th, 2013
6/19/2013
FREE Webinar by
#tatvicwebinar
Before we start...
www
6/19/2013
Q
&
A
A GACP and GTMCP company
?
#tatvicwebinar
Our speakers
A GACP and GTMCP company
Carolina Araripe
Inbound Marketing Strategist
@Tatvic
http://linkd.in/YazvVn
Amar Gondaliya
Data Model Engineer
@Tatvic
http://linkd.in/16cpDQI
Kushan Shah
Web Analyst
@Tatvic
http://linkd.in/18rfFfV
6/19/2013
#tatvicwebinar
Talking about Analytics…
A GACP and GTMCP company
Descriptive:
What has
happened?
Analytics
Predictive:
Predicts the
outcome or
future
6/19/2013
Prescriptive:
What should
happen?
#tatvicwebinar
Talking about Analytics…
A GACP and GTMCP company
Descriptive:
What has
happened?
Analytics
Predictive:
Predicts the
outcome or
future
6/19/2013
Prescriptive:
What should
happen?
#tatvicwebinar
In other words…
A GACP and GTMCP company
Predictive Analytics
“Technology that learns from experience (data) to
predict the future behavior of individuals in order
to drive better decisions.”
Source: Siegel, E. (2013) “Predictive Analytics. The power to predict who will click, buy, lie or die.”
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Introduction to R
What
A GACP and GTMCP company
• Open source statistical computing language, widely used by
organizations to solve business problems.
• Data Analysis
• Statistical Tests
• Data Visualization
• Predictive Model
• Easy to integrate
• Data frame
•
• Choose and download
a user-friendly GUI
• Forecasting
Applications
Why
How to get
started
6/19/2013
Download
and install
• Pre developed
packages
RStudio
#tatvicwebinar
R Packages
Categories of Packages
Data Extraction
A GACP and GTMCP company
For this webinar
• RGoogleAnalytics
Usage: To extract Google Analytics data into R
Contibutors: Michael Pearmain, Nick Mihailovski,
Amar Gondaliya and Vignesh Prajapati
Data Visualization
• ggplot2
Usage: Build plots and charts
Contibutor: Hadley Wickham
Time Series
Machine Learning
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Google Analytics data
A GACP and GTMCP company
Extracting your GA data into R
User performing
data extraction
Google OAuth2
Authorization
Server
Google Analytics
API
Access Token Request
Access Token Response
Call API for list
of profiles
Call API for
query
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Business Problem
A GACP and GTMCP company
Projected Growth of Retail eCommerce in US
US Retail eCommerce Sales 2011-2016
(in billion $)
$384.90
$338.90
$296.70
$194.70
2011
$225.50
2012
$258.90
2013
2014
2015
2016
Source: http://www.emarketer.com/Article/Retail-Ecommerce-Set-Keep-Strong-Pace-Through-2017/1009836
6/19/2013
#tatvicwebinar
Business Problem
A GACP and GTMCP company
Product return
“Returns are on the rise-up 19% from 2007. For every US$1 spent on merchandize, 9¢ are returned.”
“Average return rate for ecommerce retailers varies from 3-12%.”
Source: Time Magazine, Sept. 04th, 2012
Product Return Impact (per day)
Average Return Rate
9%
7%
Average Order Value
$100
$100
Orders Per Day
500
500
Total Income
$50,000
$50,000
Loss due to returns
$4,500
$3,500
Revenue post loss
$45,500
$46,500
-----
$1000
Increase in Revenue/day
6/19/2013
Increase in Revenue with
recovered returns in long run
Month
x30
$30,000
Year
x365
$365,000
#tatvicwebinar
Data Introduction
A GACP and GTMCP company
Transactional Data
6/19/2013
Pre Purchase
Data
Browsing Behavior up to shopping
cart
In Purchase
Data
Purchase Behavior from shopping
cart to thank you page
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Machine Learning Tech.
A GACP and GTMCP company
Supervised Learning
Generates a function that maps inputs (labeled data) to desired outputs (e.g.: Spam Detection)
Variables
Supervised Learning Model
Labels are right answers
from historical data
Training
Data
Machine
Learning
Algorithm
Labels
e.g.: Spam Detection
Input Data: Contains
emails marked Spam/No
Spam
Variables
Test Data
6/19/2013
Predictive
Model
Predicted
Outcome
labels
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Feature engineering
A GACP and GTMCP company
Going beyond algorithms and using domain knowledge to augment new
variables to model
•
•
•
•
E.g.: Products purchased as gifts are less likely to be returned
Create a New Variable with binary values: 1 – Product purchased as gift, 0 –
otherwise
Products purchased in holiday season are more likely to be returned
Based on Purchase date, create new variable with binary values: 1 – Product
purchased in the month Nov-Dec, 0 - otherwise
6/19/2013
#tatvicwebinar
Predictor/Response Variables
A GACP and GTMCP company
700,000.00
Price of House ($)
Response Variable
800,000.00
600,000.00
500,000.00
400,000.00
300,000.00
200,000.00
100,000.00
0.00
0
500
1,000
1,500
2,000 2,500 3,000
Size of House (sq ft)
3,500
4,000
4,500
5,000
Predictor Variable
6/19/2013
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Generalized Linear Models
A GACP and GTMCP company
glm (formula, family, data)
Formula
Response ~ Predictor (This argument shows which all variables are
independent (predictor) variables and which variable is/are
dependent(response) variable/s
Family
Binomial (Since the output variable (which is product return is
defined as binary value 0 or 1, we are using binomial family)
Data
Train data set – This data set consists values of all 18 variables (i.e.
values of dependent variables and independent variables are
given). This dataset is also called labeled data.
6/19/2013
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Machine Learning Tech.
A GACP and GTMCP company
Supervised Learning
Generates a function that maps inputs (labeled data) to desired outputs (e.g. Spam Detection)
Variables
Supervised Learning Model
Labels are right answers
from historical data
Training
Data
Machine
Learning
Algorithm
Labels
e.g.: Spam Detector
Input Data: Contains
emails marked Spam/No
Spam
Variables
Test Data
6/19/2013
Predictive
Model
Predicted
Outcome
labels
#tatvicwebinar
Summary
A GACP and GTMCP company
Probability of product return > 60%
Number of Transactions
Probability of product return ≤ 60%
> 60 %
≤ 60 %
> 60 %
< 60 %
Probability of Product Returns
 Call customer before shipping
 Send discount coupon to initiate customer for future purchase
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
ggplot2
Geometric Shapes
6/19/2013
Scales and Coordinate Systems
A GACP and GTMCP company
Plot Annotations
#tatvicwebinar
Q&A Round
6/19/2013
A GACP and GTMCP company
#tatvicwebinar
A GACP and GTMCP company
Thank you!
Carolina Araripe
carolina@tatvic.com
+91 7600-515-354
+1 276-644-0456
6/19/2013
#tatvicwebinar