make connections • share ideas • be inspired SAS High Performance Analytics och Arkitektur Börje Edlund SAS Institute Nordic EIA Borje.Edlund@sas.com , twitter @BorjeEdlund Copyright © 2013, SAS Institute Inc. All rights reserved. Agenda High Performance Analytics och Arkitektur Exempel på problemområden av idag Arkitektur för att hantera data för analys, High performance analys och användning av resultatet för att realisera värdet. Några detaljer kring nyheter inom HPA och även vad nytt inom HPA som ingår i SAS analysdelar. Demonstration av ett enkelt exempel. Copyright © 2013, SAS Institute Inc. All rights reserved. Problemområden (Finans, Telecom, Handel, Offentlig sektor, Industri) Copyright © 2013, SAS Institute Inc. All rights reserved. SAS Forum 2011 Hög kapacitet Låg kapacitet Nutek: 5 miljarder 2007 Naturvårdsverket: 2.5 miljarder 2005 Copyright © 2013, SAS Institute Inc. All rights reserved. Mer data , snabbare analyser , snabbare värde , nya möjligheter Jmf en väderprognos – ny prognos varannan timme eller endast en gång i veckan? Värdet ligger inte i att det går snabbare och exaktare – Värdet ligger i vad detta sedan gör för företaget: • Flera modeller skapas och underhålls, mer individuella erbjudanden, bättre premiesättning • Analytiker hinner testa och skapa bättre modeller, mot allt data, bättre modell med bättre vinst • Ny prissättning klar varje morgon, istället för varje månad , man slipper arbeta med gamla priser • Dagliga prognoser av lagret, eller signaler under dagen, istället för varje vecka • Analysera alla dåliga betalare istället för ett urval • Friställa tid för annat som exempelvis analyser för strategiska initiativ • Möjlighet att göra sådant som inte var möjligt , bättre affärsmodell - genomförande • Realtids ändring mha scoring av kunderbjudandet hela tiden • Analys av nytt data som sociala media med befintligt data ger bättre kunskap om kundens beteende. • Kunna köra igenom och stresstesta finansiella risker , inte en gång per natt,vecka utan ad-hoc, marknadsfördel. • Titta samtidigt och visualisera allt data, avsett storlek hitta samband och förstå vad som sticker ut, utan SQLfråga. Copyright © 2013, SAS Institute Inc. All rights reserved. Exempel logisk Arkitektur Deploy model Business process SAS Visual Analytics SAS HPA Hadoop Data Appliance pp Event Stream Processing ETL/ELT Source data Copyright © 2013, SAS Institute Inc. All rights reserved. SAS EG SAS DM SAS EM Deploy model Data Gov. Analytics lifcycle management SAS Decision Manager/RtDM Exempel logisk Arkitektur Deploy model Business process SAS Visual Analytics SAS HPA Hadoop Data Appliance pp Event Stream Processing ETL/ELT Source data Copyright © 2013, SAS Institute Inc. All rights reserved. SAS EG SAS DM SAS EM Deploy model Data Gov. Analytics lifcycle management SAS Decision Manager/RtDM SAS DATAMANAGEMENT CONSOLE Copyright © 2013, SAS Institute Inc. All rights reserved. Datahantering kortfattat Connectors & Access Engines Transparent access to data stored on a variety of platforms and formats (>60 different sources) • Data residing in Applications as well as metadata stores • Structured, semi-structured and unstructured data • Data ‘at rest’ and data streams Event stream processing Continuously analyzes data as it is received for real time decision making • Transaction fraud detection • Real-time analysis of social data streams • Personalized online offers based on navigation Copyright © 2013, SAS Institute Inc. All rights reserved. SAS Data Management SAS 9.4 DELMÄNGD AV SAS/ACCESS ENGINES SAS/Access to PC Files SAS/Access to Teradata SAS/Access to Oracle SAS/Access to SQL Server SAS/Access to DB2 SAS/Access to Vertica SAS/Access to PostgreSQL SAS/Access to SybaseIQ SAS/Access to GreenPlum SAS/Access to Netezza SAS/Access to Hadoop SAS/Access to Aster nCluster Copyright © 2013, SAS Institute Inc. All rights reserved. SAS/Access Exempel logisk Arkitektur Deploy model Business process SAS Visual Analytics SAS HPA Data Appliance pp Data Gov. Hadoop Event Stream Processing ETL/ELT Source data Copyright © 2013, SAS Institute Inc. All rights reserved. SAS EG SAS DM SAS EM Deploy model Analytics lifcycle management SAS Decision Manager/RtDM SAS IN-database SAS Embedded Process Scoring Accelarator SAS Co-located storage Exempel logisk Arkitektur Deploy model Business process SAS Visual Analytics SAS HPA Hadoop Data Appliance pp Event Stream Processing ETL/ELT Source data Copyright © 2013, SAS Institute Inc. All rights reserved. SAS EG SAS DM SAS EM Deploy model Data Gov. Analytics lifcycle management SAS Decision Manager/RtDM ESP INTRODUKTION WHAT IS SAS EVENT STREAM PROCESSING ? ENGINE Process Data DATA IN “On the Flow” (Events) (called Events) Very High speed Low latency Copyright © 2013, SAS Institute Inc. All rights reserved. DATA OUT ESP INTRODUKTION ON THE FLOW ? BATCH ENGINE 1. Prepare data 2. Run Process 3. Get results 4. Goto step 1 Copyright © 2013, SAS Institute Inc. All rights reserved. STREAM ENGINE 1. Run Process 2. Continuous loop : a. Receive data in b. Process data c. Push results out ESP INTRODUCTIO N PROCESS DATA SAS ESP Filtering Calculations DATA IN (called Events) Aggregation Joins Procedural Thresholding Pattern detection Copyright © 2013, SAS Institute Inc. All rights reserved. DATA OUT (Events) ESP Koncept “DATAFLOW CENTRIC” - DVS INTE ETL / DATA I RÖRELSE SAS ESP WINDOW WINDOW Event Stream DATA IN (Events) SOURCE FILTER 1 Event Stream WINDOW WINDOW DATA OUT WINDOW JOIN Event Stream DATA IN (Events) SOURCE Event Stream CALCUL. Event Stream 2 WINDOW WINDOW WINDOW (Events) WINDOW DATA OUT Event Stream DATA IN SOURCE 3 JOIN CALCUL. Event Stream Event Stream THRESHOL D Design of the rule model (called “Continuous Query”) using components (called “Windows”) Copyright © 2013, SAS Institute Inc. All rights reserved. (Events) (Events) Exempel logisk Arkitektur Deploy model Business process SAS Visual Analytics SAS HPA Hadoop Data Appliance pp Event Stream Processing ETL/ELT Source data Copyright © 2013, SAS Institute Inc. All rights reserved. SAS EG SAS DM SAS EM Deploy model Data Gov. Analytics lifcycle management SAS Decision Manager/RtDM SAS decision manager USE CASES.. Identify Fraudulent Activity Process New Loan Applications Personalize Online Experience Recommend Drugs & Dosage Identify Dangerous Driving Copyright © 2013, SAS Institute Inc. All rights reserved. Exempel logisk Arkitektur Deploy model Business process SAS Visual Analytics SAS HPA Hadoop Data Appliance pp Event Stream Processing ETL/ELT Source data Copyright © 2013, SAS Institute Inc. All rights reserved. SAS EG SAS DM SAS EM Deploy model Data Gov. Analytics lifcycle management SAS Decision Manager/RtDM Infrastruktur kapabel till BIG analytics Reliable Analytics Infrastructure Analytics Engine High Performance Computing Grid Computing In-Database Architecture Flexibility Desktop - SMP - MPP - Grid Copyright © 2013, SAS Institute Inc. All rights reserved. In-Memory Analytics Deployment Flexibility On Premise, Cloud, Appliance Många SAS lösningar använder HPA tekniken SAS Enterprise Miner, SAS Visual Analytics, SAS Fraud Framework, SAS Integrated Marketing , SAS Forecasting, SAS Anti Money Laundering, SAS HPRisk, SAS PriceOptimization, SAS Datamanagement, SAS Credit Scoring , SAS Social Media Analytics, SAS Dataquality , SAS Scoring Accelerator , SAS/BASE SAS/STAT mfl...... Copyright © 2013, SAS Institute Inc. All rights reserved. SAS® HighPerformance Analytics Statistics • • Binary target & continuous no. predictions Linear, NonLinear, & Mixed Linear modeling NÅGRA ANALYTISKA OMRÅDEN OCH EXEMPELANVÄNDNING Data Mining Text Mining Parsing large-scale text collections • • Complex relationships • Tree-based Classification • Variable Selection Copyright © 2013, SAS Institute Inc. All rights reserved. Extract entities • Auto. Stemming & synonym detection • Forecasting • Large-scale, multiple hierarchy problems Econometrics • Probability of events • Severity of random events Optimization Local search optimization • • Large-scale linear & mixed integer problems SAS 9.4 Juni ® SAS® HighPerformance Statistics HPLOGISTIC HPREG HPLMIXED HPNLIN HPSPLIT HPGENSELECT SAS® HPA PROCEDURE EXAMPLES (RELEASE 12.3) SAS® HighPerformance Econometrics HPCOUNTREG HPSEVERITY HPQLIM SAS® HighPerformance Optimization HPLSO Select features in OPTMILP OPTLP OPTMODEL SAS® HighPerformance Data Mining1 HPREDUCE HPNEURAL HPFOREST HP4SCORE HPDECIDE SAS® HighPerformance Text Mining SAS® HighPerformance Forecasting2 HPTMINE HPTMSCORE HPFORECAST Common Set (HPDS2, HPDMDB, HPSAMPLE, HPSUMMARY, HPIMPUTE, HPBIN, HPCORR) Dessa finns tillgängliga för kunder som kör SAS9.4 i SMP mode utan kostnad!!! (om man har enSAS Med SAS/Stat SAS enterpris eMiner , SAS/ETS SAS/OR osv.) Q4 (sas94M1) kommer ytterligare HPA anpassade funktioner Copyright © 2013, SAS Institute Inc. All rights reserved. SAS 9.4 HP Enabled Nodes • HP Explore • HP Transform • HP Variable Selection • HP Impute • HP Regression • HP Neural Network • HP Data Partition • HP Forest • HP Text Miner • HP Decision Tree Copyright © 2013, SAS Institute Inc. All rights reserved. HIGH-PERFORMANCE NODER I ENTERPRISE MINER HP PROCs in Single server (SMP MODE) libname disk BASE “/filesys”; proc hpreg data=disk.source; analytic stuff… run; OPERATING SYSTEM 1 SAS Process 3 Process 5 2 SAS starts HPREG PROC 6 Multiple threads are launched to process the incoming data As execution continues, temporary data is written out to utility files on disk 4 Disks – “/filesys” Temp/Utility files to support SAS Company Confidential - For Internal Use Only Copyright © 2013, SAS Institute Inc. All rights reserved. SAS Datasets HPPROCs in distributed architecture (MPP) HADOOP HDAT – SHARED-RACK EXAMPLE libname a sashdat; option set=gridhost=“NAMENODE”; proc hpreg data=a.source; analytic stuff… performance nodes=all; run; HADOOP NAMENODE OPERATING SYSTEM 4 Process NODE 1 SAS Process 4 1 SAS Process Steps: SAS starts HPREG PROC Due to GRIDHOST and proper access engine setting, multi-threaded processes are started on grid nodes 4 3 2 Company Confidential - For Internal Use Only Copyright © 2013, SAS Institute Inc. All rights reserved. Data 6 7 As processes start up, data is lifted into RAM from HDFS. Processing occurs in parallel against in memory data Results return to initiating process on SAS Server 5 NODE 2 4 5 Data 6 NODE N 5 4 6 Data SAS® SCORING ACCELERATOR OpSys1 Flat file extract SQL Server SAS Customer Selection Copyright © 2013, SAS Institute Inc. All rights reserved. BUSINESS VALUE Past Approach • Daily process begins with flat file creation at 6:30am – SLA delivered at ~9:30am. In-Database Approach • Daily process begins at 4:00am with EDW load. OpSys1 Business Value - Scope of customer analysis: 350K vs. 40M - Monthly collections: $1M-$3M per month - Approximately 13% incremental lift • File transferred to SQL Server, limited to ~350K customer records based on specific criteria. • All operational data loaded directly to EDW. No flat file or intermediate processing is needed. • 300 step process to support data mining life cycle. • 10 step process • Scoring and customer selection done indatabase against ALL customer rows 30 MINUTES TO SCORE ~350k customers 4 MINUTES TO SCORE ~40M customers Runs in ~ 3 HOURS Runs in 12 MINUTES SAS Scoring Accelerator Teradata EDW Demonstration på skillnaden att köra analys på gamla sättet och det nya HPA sättet! Copyright © 2013, SAS Institute Inc. All rights reserved. 28 make connections • share ideas • be inspired Frågor? Copyright © 2013, SAS Institute Inc. All rights reserved.
© Copyright 2024