Improving Batch Application Service Through Tuning and Parallelism

Improving Batch
Application Service
Through Tuning and
Parallelism
Dan Squillace
Mainframe Support Manager
SAS Institute
Cary, NC USA
Dan.Squillace@sas.com
Copyright © 2005, SAS Institute Inc. All rights reserved.
Some Business Drivers for Performance
Improvement Acidities
 Increasing data volumes
• More customers
• More data about each customer needed for increasingly
sophisticated analytics which aid better and more timely
decision-making.
 Decreasing processing window
• Improve BI application availability by shortening ETL
elapsed time.
 Increasing pressure to reduce costs
• Lower resource requirements
• Improve competetive position
Copyright © 2005, SAS Institute Inc. All rights reserved.
Session Overview
 This session focuses on processing
improvements beneficial to handling large data
volumes.
 Performance improvement areas
• CPU optimization
• Reducing I/O
• Improved overlap and parallelism
• Elapsed time optimization (Not the same
 Focus Areas
• DATA STEP tuning
• New SAS9 features
Copyright © 2005, SAS Institute Inc. All rights reserved.
Session Outline
 Don’t forget the basics! - A ShortTuning Case
Study






DATA Step Views
PROC SUMMARY w/DATA Step View
DATA Step hash table functions
SAS Parallel Data Engine (SPDE)
SAS/Connect Pipes
Wrap-up
Copyright © 2005, SAS Institute Inc. All rights reserved.
Back to Basics:
High-Volume DATA Step Optimization
 Before implementing parallel operations, make
sure basic processing flow is efficient
 When processing high volumes of data, even
apparently small changes can have a large effect
 The following customer case study illustrates
several points.
Copyright © 2005, SAS Institute Inc. All rights reserved.
Program processes 36 million MXG TYPE74
records (436 CPU seconds 9672 G6)
DATA FILE.A;
SET INFILE1.TYPE74;
KOUNT = 1 ;
IF VOLSER = '.' OR VOLSER = ' '
THEN DELETE ;
IF SYSTEM = '888K' OR SYSTEM = '888Z' OR
SYSTEM = '888Q' OR SYSTEM = '888V' OR
SYSTEM = '888P' THEN DO ;
IF DATEPART(SYNCTIME) < '03APR04'D THEN
SYNCTIME = SYNCTIME - '06:00:00.00'T ;
IF DATEPART(SYNCTIME) > '02APR04'D THEN
SYNCTIME = SYNCTIME - '05:00:00.00'T ;
END ;
SYMNUM = 0 ;
IF DATEPART(SYNCTIME) < '17MAY04'D THEN DO ;
IF DEVNR > 58FFX AND DEVNR < 5FFFX THEN SYMNUM
= 111;
IF DEVNR > 6FFFX AND DEVNR < 7FFFX THEN SYMNUM
= 456;
IF DEVNR > 7FFFX
THEN SYMNUM = 234;
IF DEVNR => 5000X AND DEVNR < 5200X THEN SYMNUM
= 234;
IF DEVNR => 5FFFX AND DEVNR < 7000X THEN SYMNUM
= 876;
END;
IF DATEPART(SYNCTIME) > '17MAY04'D THEN DO ;
IF DEVNR > 4FFFX AND DEVNR < 7000X THEN SYMNUM
= 223;
IF DEVNR > 6FFFX AND DEVNR < 7FFFX THEN SYMNUM
= 456;
IF DEVNR > 7FFFX
THEN SYMNUM = 234;
END;
Copyright © 2005, SAS Institute Inc. All rights reserved.
TIPPCT = (IORATE * (AVGCONMS +AVGDISMS))/10 ;
FORMAT TIPPCT 5.2 ;
IF SYMNUM = 0 THEN DELETE ;
IO_1111 = 0 ; IO_4563 = 0 ; IO_234 = 0 ; IO_8765 = 0 ;
IO_22355 = 0 ;
IF SYMNUM = 1111 THEN IO_1111 = IORATE ;
IF SYMNUM = 4563 THEN IO_4563 = IORATE ;
IF SYMNUM = 234 THEN IO_234 = IORATE ;
IF SYMNUM = 8765 THEN IO_8765 = IORATE ;
IF SYMNUM = 22355 THEN IO_22355 = IORATE ;
DATE = DATEPART(SYNCTIME) ; FORMAT DATE DATE7. ;
INTE = TIMEPART(SYNCTIME) ; FORMAT INTE TIME19.2 ;
EMCTYPE = 'ESCON' ;
IF SYMNUM = 22355 THEN EMCTYPE = 'FICON' ;
IF IORATE < 10
THEN DELETE ;
KEEP VOLSER DEVNR TIPPCT DATE INTE
SYMNUM IO_1111 IO_4563 IO_234 IO_8765
SYNCTIME IO_22355 EMCTYPE
IORATE
AVGRSPMS AVGIOQMS AVGPNDMS AVGCONMS AVGDISMS
AVGPNCHA AVGPNCUB AVGPNDEV AVGPNDIR
PCTDVCON PCTDVUSE KOUNT ;
Do filtering as early as possible
TIPPCT = (IORATE * (AVGCONMS +AVGDISMS))/10 ;
FORMAT TIPPCT 5.2 ;
IF SYMNUM = 0 THEN DELETE ;
IO_1111 = 0 ; IO_4563 = 0 ; IO_234 = 0 ; IO_8765 = 0 ;
IO_22355 = 0 ;
IF SYMNUM = 1111 THEN IO_1111 = IORATE ;
IF SYMNUM = 4563 THEN IO_4563 = IORATE ;
IF SYMNUM = 234 THEN IO_234 = IORATE ;
IF SYMNUM = 8765 THEN IO_8765 = IORATE ;
IF SYMNUM = 22355 THEN IO_22355 = IORATE ;
DATE = DATEPART(SYNCTIME) ; FORMAT DATE DATE7. ;
INTE = TIMEPART(SYNCTIME) ; FORMAT INTE TIME19.2 ;
EMCTYPE = 'ESCON' ;
IF SYMNUM = 22355 THEN EMCTYPE = 'FICON' ;
IF IORATE < 10
THEN DELETE;
KEEP VOLSER DEVNR TIPPCT DATE INTE
SYMNUM IO_1111 IO_4563 IO_234 IO_8765
SYNCTIME IO_22355 EMCTYPE
IORATE
AVGRSPMS AVGIOQMS AVGPNDMS AVGCONMS AVGDISMS
AVGPNCHA AVGPNCUB AVGPNDEV AVGPNDIR
PCTDVCON PCTDVUSE KOUNT ;
Copyright © 2005, SAS Institute Inc. All rights reserved.
Move to top of DATA
Step
 CPU Time reduction
67%
Additional Steps
 Put KEEP= as DATA SET option to bring in fewer
variables into the DATA step. Note: This
decreases CPU time, but not I/O time.
 Use IF-THEN-ELSE or SELECT instead of just
IF-THEN.
 Eliminated redundant DATEPART function calls.
 Cumulative CPU time reduction:
80%
Copyright © 2005, SAS Institute Inc. All rights reserved.
Final Step
 Move filtering of blank VOLSER and IORATE
<10 to WHERE clause DATA SET option.
 Total cumulative CPU time reduction:
86%
Net savings of 368 CPU
seconds
Copyright © 2005, SAS Institute Inc. All rights reserved.
The Value of CPU Time Reduction
 Always important on the mainframe because it is
inherently a multi-workload beast.
 Often considered unimportant (or less so
anyway) on Windows and UNIX platforms
because of dedicated nature of systems.
Elapsed time is often more important.
 Changing with increasing use of server
virtualization. Affects how many virtual servers
can run on a physical platform.
• Logical Partitions or Domains on UNIX systems
• Virtual Machines on Windows and Linux systems
Copyright © 2005, SAS Institute Inc. All rights reserved.
Some General Strategies for Improving
Processing of Large Data Volumes
 Reduce volume of data passed (e.g. keep only
required variables in intermediate files)
 Reduce number of data basses
 Eliminate or reduce use of non-linearly scalable
techniques such as sorting.
 Exploit memory
 Exploit processing overlap and parallelism
Copyright © 2005, SAS Institute Inc. All rights reserved.
Exploiting New SAS Features
 We’ll use two scenarios from common
processing challenges encountered when
processing transaction data for performance and
service level reporting purposes.
 The improvements made to the processing
strategy for these scenarios …..
• Reduce number of data basses
• Eliminate or reduce use of non-linearly scalable
techniques such as sorting.
• Exploit memory
• Exploit processing overlap and parallelism
Copyright © 2005, SAS Institute Inc. All rights reserved.
General Scenario Chrematistics
 Very high data volumes (millions of records, tens
or hundreds of Gigabytes
 Multiple summarizations desired
 Detail records retained only for exceptional
cases.
Copyright © 2005, SAS Institute Inc. All rights reserved.
Scenario One
 High-volume transaction data, say from web log,
CICS, DB2, SAP
 Desired summarized file for service level
management, accounting, performance and
capacity management.
 Not interested in keeping every detail transaction
record.
Copyright © 2005, SAS Institute Inc. All rights reserved.
DATA Step Views
 Can be used to eliminate
a data passes
 Runs two tasks in parallel,
but does not multiprocess
 In this case, eliminates
one pass of the data.
Copyright © 2005, SAS Institute Inc. All rights reserved.
data lib.a / view=lib.a;
infile ……;
input x ……;
run;
proc sort data=lib.a; by x;
run;
SAS DATA Step View caveats
 Can inhibit use of indexed I/O; Data Set Option
WHERE clause cannot use index with a DATA
Step view.
 DATA Step views are sensitive not only to SAS
release and version levels, but sometimes to
maintenance levels.
Copyright © 2005, SAS Institute Inc. All rights reserved.
DATA Step Views with Proc Summary
 Eliminate data passes
and saves disk space.
 Eliminate sort
 Can produce multiple
summarization data sets
in one pass
 Benefits from large region
size (enough to hold
crossings)
 SUMMARY in SAS 9.1
• Multithreaded
• Does not keep n-way in
memory unless needed.
Copyright © 2005, SAS Institute Inc. All rights reserved.
data lib.a / view=lib.a;
infile ……;
input a b x y……;
run;
proc summary data=lib.a;
CLASS statement;
TYPES statement;
OUTPUT statement(s);
run;
SAS9 Threaded Procedures






SORT
SUMMARY/MEANS
TABULATE
REPORT
SQL
REG, GLM, LOESS, DMREG,DMINE
Copyright © 2005, SAS Institute Inc. All rights reserved.
Scenario Two
 High Volume Event data (time-oriented (e.g.
ARM log)
 Transactions must be constructed from multiple
event records
• Type S – transaction start ( ID, start time, code, )
• Type E – transaction end ( ID, end time, CPU time)
Copyright © 2005, SAS Institute Inc. All rights reserved.
Data arrival pattern
Start 1
Start 2
End 1
(write out 1)
Start 3
End 2
(write out 2)
Start 4
Start 5
End 4
(write out 4)
End 5
(write out 5)
End 3 (write out 3)
Copyright © 2005, SAS Institute Inc. All rights reserved.
DATA Step Hash Table Support
(New in SAS9)
 Can replace lookup formats
 Can have entries dynamically added, modified,
and removed
 For this Scenario, use a Hash Table to
accumulate transaction records from start and
end events.
Copyright © 2005, SAS Institute Inc. All rights reserved.
data transactions view=transactions;
declare hash transactions();
transactions.defineKey("tr_id");
transactions.defineData("tr_start", "tr_code“);
transactions.defineDone();
input type @;
if type = 'S' then do;
input tr_id tr_code tr_start;
rc=transactions.add();
end;
else if type='E' then do;
input tr_id tr_end tr_cpu;
rc = transactions.find();
response = tr_end - tr_start;
output;
rc = transactions.remove();
end;
Copyright © 2005, SAS Institute Inc. All rights reserved.
The Scalable Parallel Data Engine (SPDE)
 New in SAS 9.1
 Included with BASE
 Available on all 9.1 platforms
 Advantages
• Parallel data loading and index creation
• Parallel reads and searches
• Uses multiple indices to resolve a search
Copyright © 2005, SAS Institute Inc. All rights reserved.
SPDE – Scalable Performance Data Engine
SAS® System
data
Scalable Performance Data Engine
metadata
data1
data2
data3
data4
index
Hybrid index
Bitmap/B-tree
Bitmap/B-tree
Copyright © 2005, SAS Institute Inc. All rights reserved.
SAS SPDE implementation on z/OS
 USS thread services
 USS directory-based file systems
• zFS
• hFS
• NFS file systems
 Exploitation
• Define file system
• Change LIBNAME engine specification
Copyright © 2005, SAS Institute Inc. All rights reserved.
SPDE data set allocation on z/OS
 NFS – follow same guidelines as for Open
Systems
 HFS – Use separate HFS file systems for DATA
and INDEX components; perhaps multiple for
DATA. Spread HFS’s across Shark (ESS 2105)
loops.
 zFS - No special considerations! Use multivolume zFS particularly if
• Storage system has Parallel Access Volumes (PAV)
• ESS 2105-800 has Arrays Across Loops feature
Copyright © 2005, SAS Institute Inc. All rights reserved.
Scalability – SAS 9.1
SAS Scalable Architecture in SAS
Foundation
Scalable Performance Data Access
SAS
Teradata Sybase
DB2
Oracle
Scalable SAS/ACCESS
Piping
Piping
SAS
SAS
SAS
CONNECT
CONNECT
CONNECT
CPU 1
CPU 2
Threaded Procedures
THREAD 1
THREAD 2
THREAD N…
Copyright © 2005, SAS Institute Inc. All rights reserved.
Remote
Host
MP Connect Pipes




New in SAS9
Uses TCP/IP socket engine
Superior to DATA Step View approach
Provides true multi-processing
Copyright © 2005, SAS Institute Inc. All rights reserved.
/* ----- DATA STEP - PROCESS P1 ------ */
/* ---- SUMMARY - PROCESS P2 ----- */
SIGNON P1 SASCMD='!SASCMD';
RSUBMIT P1 WAIT=NO;
LIBNAME OUTLIB SASESOCK ":PIPE1";
SIGNON P2 SASCMD='!SASCMD';
RSUBMIT P2 WAIT=NO;
LIBNAME INLIB SASESOCK ":PIPE1";
data outlib.transactions;
declare hash transactions();
transactions.defineKey("tr_id");
transactions.defineData("tr_start", "tr_code“);
transactions.defineDone();
proc summary data=inlib.transactions;
CLASS statement;
TYPES statement;
OUTPUT statement(s);
run;
input type @;
if type = 'S' then do;
input tr_id tr_code tr_start;
rc=transactions.add();
end;
PROC PRINT;RUN;
ENDRSUBMIT;
WAITFOR _ALL_ P1 P2;
else if type='E' then do;
input tr_id tr_end tr_cpu;
rc = transactions.find();
response = tr_end - tr_start;
output;
rc = transactions.remove();
end;
ENDRSUBMIT;
Copyright © 2005, SAS Institute Inc. All rights reserved.
In Summary……
 Remember the importance of basic SAS
program tuning skills which have been wellknown for years.
 Take advantage of the significant SAS9 features
which can help you
• Improve response and turnaround times
• Improve availability times for BI applications by
shortening the batch window.
• Reduce costs by cutting resource consumption and
utilizing the most effective combination of CPU,
memory, and I/O resources
Copyright © 2005, SAS Institute Inc. All rights reserved.
Copyright © 2005, SAS Institute Inc. All rights reserved.
31