SQL for IR Staff (You can do it…We can help! ©) A crash course for novices in appropriately selecting and transforming data from relational databases using SAS® Proc SQL October 19, 2008 SAIR @ Nashville, TN. Evan Davies The College of William and Mary Chartered 1693 Dennis Catley Why are you here? • You may have some combination of: • Enough training to run someone else’s query and to wonder about its syntax… • Any knowledge gained from point & click environments, help files, or ‘SQL for dummies’-type books… • The need to join two or more tables to select data for analysis… • The desire to return to work with some new skills to justify the SAIR trip! SAS and SQL Knowledge Evaluation • The reality is that it is hard to assess where anyone fits along any learning continuum. STARTUP • Find the workshop datasets and programs at their storage location; • Invoke SAS interactive session; • Explain interactive windows and program flow; • Explain the necessary LIBNAME statement to target the sample data sets; • Execute a simple print job to confirm operations on everyone’s computer Proc SQL under • SQL first proposed by IBM in 1970s; popularized and standardized in late-1980s. • Incorporated into SAS system starting with version 6.18 (experimental). • SAS Proc SQL has a few more features than ANSI SQL and is slightly different from other flavors. • Wholly integrated into the SAS programming environment; SAS capable of addressing nonSAS relational databases natively. Reasons for a Crash Course • You often have no dedicated instruction in a small I.R. shop. • Your I.T. staff provide limited support and are sometimes puzzled by your requests. • Available (i.e. free) resources don’t concentrate on selecting data for analysis and often use business sales examples. Mango chutney, anyone? More Reasons… • You may be transitioning into an institutional RDBMS, like Banner or Peoplesoft, with a data warehouse or mart being attached to it. • Your drag & drop SQL generator or prewritten programs may be capable of answering questions, but you are not sure how to modify the queries and you spend time dumping data into intermediate applications for more processing. Today’s Effort • We will concentrate on selecting data because your job focus is institutional research and analysis. • We will use IR-related examples. • We can’t teach you all about your own data or your campus access methods. • We have lots of material to cover during the workshop, so we may have to defer questions occasionally. • SQL wasn’t built in a day. Your mastery of it will not be achieved in a day… Concept Bridging An IR data selection task Common SQL learning roadblock A resulting table with valid data Code to produce desired result Concept Bridging Topics 1. 2. 3. 4. 5. 6. 7. 8. Terminology is the primary key Lose your mind in a single query Conditions are optimal Groups, having, functions Cohorts versus survivors Keys to the kingdom Just passing through The power of sub-queries An Important FYI • Data are never changed when you run a SQL query…..never. • You may bring the institution’s computing services to a halt, or dim the lights for a square mile, but the data are fine. Terminology is the primary key The IR professional’s need: • Knowing the proper names for the parts of the SQL statement is the first building block of SQL learning. It serves to: – Help you understand their functions, different forms, and relationships, – Help you plan your data selection process, – Help you talk about problems more precisely to the DBAs, data warehouse staff, and/or the IT group (who are immersed in buckets of data and often need brought into your ‘problem arena’) SQL Concepts Explained • Table – a collection of data, a SAS data set, ORACLE table, or excel spreadsheet, for example • Database – most often a collection of related tables • Columns – you can think in the Excel sense • Rows – again, think Excel • DBA – Data Base Analyst (your friend if you behave, and stay away from Cartesian joins). • Metadata – information about data, info about a particular table for instance At VT I have a superb group of data warehouse people who serve a similar role for me but are more data and information literate than a typical DBA. Those types can be invaluable!!! (know how to kiss up) Tables & Rows & Columns (oh my!) The 4 records below are a Table. columns rows Seinfeld Kramer Benes Costanza T T T P M M F M Associate Assistant Professor Instructor PHD MFA MM MARCH Columns - think vertical…..rows - think horizontal Lose your mind in a single query The IR reality: • Rapid demand for data from relatively new sources (like warehouses and changing RDBMS), • the ad-hoc specialty nature of many requests, • your own novice SQL status, • concerns about the logical relationships between tables • and concerns about the cleanliness and completeness of data all are legitimate reasons to: • develop and test queries in iterative stages and steps, and to • follow good programming practices like using indentation, white spacing, and documentation Introducing SQL • SQL Stands for Structured Query Language • Typically built via as a SELECT-FROMWHERE construction • Repeat after me… “SELECT, FROM, WHERE” • Again… “SELECT, FROM, WHERE” • Is it SQL (see-kwil) or SQL (ess-que-ell)? Who cares? Select, From & Where Explained • SELECT – is used to choose which columns you desire, & in what order you want them to appear • FROM – references the table(s) that contains the columns you need • WHERE – is employed to select which rows you want to include Jumping Off to Select Statements Sample group of faculty records (FACULTY table) Lname tenure_code gender rank SSN ---------------------------------------------------Catley N M Lecturer 123456789 Davies T M Assistant 987654321 Superman P M Professor 111222333 Seinfeld T M Associate 444555666 Kramer T M Instructor 777888999 Benes T F Professor 999888777 Costanza N M Instructor 666555444 Newman P M Assistant 333222111 Susan N F Associate 123454321 Select all columns and rows from a table Proc sql; Select * from faculty; Quit; Note there is no WHERE clause. SELECT and FROM are required. All other statements, including WHERE, are optional. Seldom will you NOT use a WHERE clause, you typically are looking for a subset of both rows and columns Result set from prior query Lname tenure_code gender rank SSN ---------------------------------------------------Catley N M Lecturer 123456789 Davies T M Assistant 987654321 Harriman P M Professor 111222333 Seinfeld T M Associate 444555666 Kramer T M Instructor 777888999 Benes T F Professor 999888777 Costanza N M Instructor 666555444 Newman P M Assistant 333222111 Susan N F Associate 123454321 The * syntax with no WHERE clause results in all columns and rows being selected. The * symbol means select all columns. The Where Clause The where clause in SQL has 2 main functions. • Filtering the rows chosen, i.e. providing row selection criteria. • Providing join commonality between 2 or more tables (keys will be discussed later) Filtering Rows – a Use of the WHERE Clause SELECT lname, tenure_code FROM faculty WHERE tenure_code = ‘T’; SELECT is choosing AND ordering columns, and WHERE is choosing which rows to keep. SQL uses single quotes around text values (most database systems will also accept double quotes). Numeric values should not be enclosed in quotes. Coding for Readability • You will notice indentation used when we write our SQL code. Is it syntactically required? • No -- it is not, but it makes the code much easier to read and to understand!! Result set of previous query Lname tenure_code ----------------------Davies T Seinfeld T Kramer T Benes T What’s one change we may wish to see in this result set? We Now Want All Tenure-track SELECT lname , tenure_code FROM faculty WHERE tenure_code in ( ‘T’,’P’) ORDER BY lname; Note the addition of an ORDER BY, and use of the IN operator. Note also that we could override the default ORDER with a DESC option, and move the Z’s to the top. order by lname desc; Result set after ordering by lname Lname tenure_code ---------------------Benes T Davies T Harriman P Kramer T Newman P Seinfeld T Sample group of faculty records (FACULTY table) Lname tenure_code gender rank dept ---------------------------------------------------Catley N M Lecturer Accounting Davies T M Assistant Biology Superman P M Professor Biology Seinfeld T M Associate Biology Kramer T M Instructor Accounting Benes T F Professor Accounting Newman N F Associate Accounting Could We Get a List of Depts? (from the faculty table) SELECT dept FROM faculty ORDER BY dept ; dept -------------------Accounting Accounting Accounting Accounting Biology Biology Biology How do we solve this inadequacy? Distinct Keyword select distinct dept from faculty order by dept ; The DISTINCT keyword removes duplicate items from the result set. Jumping Off to Conditions Conditions are optimal The IR Professional’s need: • Available logical operators, to select data precisely: • IN and NOT IN (require comma separators) • EXISTS, IS NULL, NOT EQUAL TO (<>) , LIKE (%) Note: In some versions of SQL the <> operator may be written as != Roadblock: if your intuition tells you that there is a better way to construct your WHERE clause, then there likely is. This is where your SQL friends, DBAs, and Data Warehouse contacts come in. Ask them. Ask us, we’re an email or phone call away! The LIKE Operator The LIKE (%) condition is used to specify a search for a pattern in a column. Samples The following SQL statement will return persons with first names that start with a ‘P': SELECT * FROM Persons WHERE FirstName LIKE ‘P%‘; The following SQL statement will return persons with first names that end with an 'a': SELECT * FROM Persons WHERE FirstName LIKE '%a' ; BETWEEN …. AND The BETWEEN ... AND operator selects a range of data between two values. These values can be numbers, text, or dates. Roadblock: The BETWEEN...AND operator is treated differently in different databases. NOT BETWEEN can be used as well. SELECT * FROM Persons WHERE LastName NOT BETWEEN ‘Catley' AND ‘Clemente' ; Jumping Off to Groups and Functions GROUP, HAVING, Functions The IR Professional’s need: •Grouping data allows flexible aggregation and summary data. (an IR must!) •SQL provides 2 related statements for summarizing data. GROUP BY and HAVING. •The HAVING clause is reserved for aggregate functions. It is usually placed near the end of a SQL block which also enhances code readability. (think of HAVING as a final filter) The Having Clause Lname gender salary department ---------------------------------------Davies M 88000 Biology Seinfeld M 77000 Accounting Catley M 45000 Accounting Benes F 60000 Biology Susan F 70000 Biology Wilson F 45000 Accounting Kelly F 50000 Accounting Given this table called FACULTY We’d like to know which departments have a mean salary for males or females above $50,000. Sample of Having Clause SELECT dept , gender , avg(salary) FROM faculty GROUP BY dept , gender HAVING avg(salary) > 50000; Results of prior query dept gender ----------------------------------------Accounting M 61000 Biology F 65000 Biology M 88000 What can we do to improve this? A couple quick improvements SELECT dept , gender , count(*) as faculty , avg(salary) as Mean_salary FROM faculty GROUP BY dept , gender HAVING avg(salary) > 50000 ORDER BY dept , gender; Two summary stats used, COUNT and AVG Result of prior Query dept gender faculty Mean_salary ------------------------------------------------------Accounting M 2 61000 Biology F 2 65000 Biology M 1 88000 Be careful! SELECT dept , gender , avg(salary) FROM faculty HAVING avg(salary) > 50000 A having clause used with no group by…..any guesses? Result of prior query dept gender ----------------------------Biology M 62142.86 Accounting M 62142.86 Accounting M 62142.86 Biology F 62142.86 Biology F 62142.86 Accounting F 62142.86 Accounting F 62142.86 What a mess!!! GROUP BY is usually coupled with HAVING. Be Careful 2 (the sql sequel) SELECT dept , gender , count(*) as faculty , avg(salary) as Mean_salary FROM faculty GROUP BY dept HAVING avg(salary) > 50000 ORDER BY dept , gender ; Grouping by dept alone may not be what you want? Result of Prior Query dept gender faculty Mean_salary ------------------------------------------------Accounting F 4 54250 Accounting F 4 54250 Accounting M 4 54250 Accounting M 4 54250 Biology F 3 72666.67 Biology F 3 72666.67 Biology M 3 72666.67 Always examine what you expect to get versus what you *DO* get ! Some of the Common Aggregate Functions AVG(Column) Returns the average value of a column COUNT (Column) Returns the number of rows (without a NULL value) of a column COUNT(*) Returns the number of selected rows FIRST(Column) Returns the value of the first record in the specified field LAST(Column) Returns the value of the last record in the specified field MAX(Column) Returns the highest value of a column MIN(Column) Returns the lowest value of a column SUM(Column) Returns the total sum of a column Who has the Highest Salary in Each Department? SELECT lname , dept , gender , salary FROM faculty GROUP BY dept HAVING salary = max(salary) ORDER BY dept ; This is basically evaluating each row against a summary stat from the query. Results of Prior Query lname dept gender salary ---------------------------------Seinfeld Accounting M 77000 Davies Biology M 88000 Functions SELECT dept , gender , count(*) as faculty , avg(salary) as Mean_salary FROM salaries WHERE substr(dept,1,1) = 'A' GROUP BY dept , gender HAVING avg(salary) > 50000 ORDER BY dept , gender ; Table Aliases • A table alias is a temporary alternate name for a table, a nickname of sorts. • You can specify table aliases in the FROM clause • The AS keyword is optional, pros include maybe enhanced readability, cons include the added keystrokes • Table aliases are used in joins to qualify column names so that the correct columns will be processed. • Table aliases MUST be used when multiple (2 or more) tables have the same column names, but are optional otherwise Storing Results Proc sql; Create table Engineering as SELECT a.lname , a.retire_sum , b.dept FROM faculty_age a , faculty_depts b WHERE a.lname = b.lname and college = ‘Engineering’; Quit; Note the use of table ‘aliases’ to assign a short reference to each table being used in a join. Jumping Off to Joins.. After the Break! Cohorts vs. Survivors The IR professional’s need: • IR data selection generally operates on a ‘cohort’ model. Much SQL instruction is geared toward developing ‘survivor’ datasets. While efficient, the survivor model is not useful for many of our purposes. • We concatenate (union) rather than joining datasets for many analytical purposes, like year-to-year studies. Cohorts vs. Survivors Explicit Join form of the select statement SELECT Source1 variables , Source2 variables FROM Source1 JOIN Source2 ON Join key criteria (columns) WHERE Row selection criteria (rows) Cohorts vs. Survivors • A cohort group is the result of an OUTER join of the Left, Right, or Full type. • A survivor group is the result of an INNER join. The Simple Explanation Cohorts vs. Survivors • Inner join – Retrieves only matching rows that “survive” the join selection criteria. • Outer Left join – The first dataset mentioned (left of the join keyword) is the cohort and the second (right) is transactional. All rows from the cohort are returned with information, as available, added from the transaction when rows from both tables match on the key column(s). • Outer Right join – The second dataset mentioned is the cohort and the first is the transactional. • Outer Full join – The resulting table has cohort records from both data sets, including “matching” rows as well as “non-matching” rows. Survivors R L A B C D E FR SO JR SR UN INNER JOIN A B D E FR SO SR UN VA MA TX VA A VA B MA D TX E VA G AR Cohorts R L A B C D E FR SO JR SR UN OUTER LEFT JOIN A FR VA B SO MA C JR D SR TX E UN VA A VA B MA D TX E VA G AR Cohorts R L A B C D E OUTER RIGHT JOIN FR SO JR SR UN A B D E G FR SO SR UN VA MA TX VA AR A VA B MA D TX E VA G AR Cohorts R L A B C D E OUTER FULL JOIN FR SO JR SR UN A B C D E G FR SO JR SR UN VA MA TX VA AR A VA B MA D TX E VA G AR Cohorts vs. Survivors Implicit Join form of the select statement *acquire admissions h.s. region tables; SELECT L.*, R.hs_name, R.hs_type, R.hs_recruit_region FROM admissions_applicant_200410 L , admissions_geo_2004 R WHERE L. high_school _ code = R.geo_hs_code and L.term_code = '200410' No JOIN ON statement ! Can’t do a SAS outer join with this syntax… Cohorts vs. Survivors Select expanded to join three datasets *acquire admissions cohort and h.s. region from freeze tables; SELECT * FROM admissions L LEFT JOIN gorsgeo ResultRtable from first join ON L.sbgi_code = R.gorsgeo_code LEFT JOIN Admissions_officer A ON L.sbgi_code = A.sbgi_code_key WHERE L.term = '200410' and R.gorsgeo_status_ind IS NULL Cohorts vs. Survivors Concatenation of data tables create table alldata as select a.pidm_key as pidm, a.reports_frp from ssn_cohort a OUTER UNION CORRESPONDING select b.pidm, b.reports_frp from HHMI b ; SQL assumes that your variable names and data characteristics match… 1 2 3 4 4 5 6 1 2 3 4 4 5 6 Cohorts vs. Survivors Non-duped concatenation of data tables create table alldata as select a.pidm_key as pidm, a.reports_frp from ssn_cohort a UNION CORRESPONDING select b.pidm, b.reports_frp from HHMI b ; All row information must be the same… 1 2 3 4 4 5 6 1 2 3 4 5 6 Jumping Off to Join Statements Keys to the kingdom The IR professional’s need: • You may need to manipulate variables to make them successful join keys. Those variables may not be the obvious primary key for the table in which they reside, and they currently may not have the necessary values for a successful join. • There are attributes about data tables and keys that can materially affect how quickly and successfully you can use them in join operations. Discovering this meta-data is worth the time and effort it will take. Tools can help. Keys to the kingdom Let’s review the notion of keys • Keys are always made up of variables (columns), not observations (rows). Keys and variables are always associated with the join criteria, not the row selection criteria, in SQL. • A key column is any variable that can reliably join together tables of data that are somehow related. Sometimes more than one column must be combined to form the equivalent key column for another table’s key. Occasionally, data must be transformed in order to form an equivalent key column. Keys to the kingdom Key and Variable Transformation • The ‘select *’ syntax is only marginally useful. While it saves you from typing in column names, using it brings redundancy if you want to alias or transform variables, and may result in system renamed variables. • Use aliases to shorten, rename, or otherwise better identify columns for joining to other tables. • You may have to re-label an aliased variable in your analysis package with a label command. Keys to the kingdom Key and Variable Transformation • There are many arithmetic and text-related operations that you can use on columns in the selection process to make them match other keys, or to transform them. Frequently used are: • Substringing: Select substr(course,1,4) as course_abbrev • Scanning: Select = scan(packlevlhr,2,'{') as hrs_earn • Addition: Select sat_verb+sat_math as sat_total Key and Variable Transformation • Any arithmetic operation including (): Select (s_contrib+f_contrib)/(aid_awarddiscret_aid) as contrib_ratio • Concatenation: Select course_name || course_number || course_section as course_key • Direct Substitution: Select ‘Registered’ as student_status Creating a New Column from Scratch Proc SQL; Select lname , ‘tenure-track’ as faculty_type from faculty where tenure_code in ( ‘T’,’P’); Quit; Additional Select Control Syntax Proc SQL; Select lname label = ‘Last Name’ , ‘tenure-track’ as faculty_type format = $25. from faculty where tenure_code in ( ‘T’,’P’); Quit; • Note: faculty_type could also be set to a single flag variable such as ‘T’, and using a format = $1. Jumping Off to Keys and Variable Transformation Keys to the kingdom Using a case statement to transform a key SELECT course_number, course_identification, case when academic_period = '200410' then '20039' when academic_period = '200420' then '20041' when academic_period = '200510' then '20049' else '?' end as term, actual_enrollment FROM schedule_offering WHERE school in (‘UG' , ‘GA') and faculty_sts = ‘TR' Keys to the kingdom Other case statements case when sexappl in (' ','N') then ‘U’ else sexappl end as gender, case when bethn IS NULL then ‘NR’ else bethn end as ethnic_desc, case when reg_dapp = ‘999’ AND app_code <> ‘UNCL’ then ‘200’ when ug_appst = '200001' then '300‘ when ug_appst = '200006' then '400' when lw_appst = '200007' then '500' when gs_mapp = '200009' then ‘700‘ else '???‘ end as init_enroll_status Keys to the kingdom The notion of keys • Tables can have primary keys. A primary key has two special conditions – It is not allowed to be empty (null) and it is not allowed to have repeating values, which assists in referential integrity. Not all tables have keys that meet these criteria, nor should that be expected as a norm in IR data joins. • Keys can be indexed, which means the join operation runs faster when those keys are referenced. Conversely, if they are not referenced, the join can be very, very slow. Keys to the kingdom Key findings • You need to develop a plan for finding out about primary keys and indexed key variables in large data tables to avoid long execution times. • Your data base analyst might help you with information about “explain plans” and SQL developer tools that may be located within your primary analysis package or available separately. TOAD is a highly useful tool for an Oracle-based RDBMS. Keys to the kingdom Keys to the kingdom Keys to the kingdom Keys to the kingdom Just Passing Through The IR professional’s need: • You may have direct connections through SQLnet to the processor of your relational database as well as the computing power of your particular analysis package. Recognizing this fact and managing that connection is important for faster, and sometimes, any results. • There may be differences between SQL implementations in the two locations. Just Passing Through Remote connection or ‘Pass-through’ • The concept of ‘pass-through’ refers to connections made between your computer (and analysis application) and the server where your RDBMS keeps all the data. DATA Just Passing Through Remote connection or ‘Pass-through’ • When you join tables that are all on your computer, joins are made promptly. SQL RESULT Just Passing Through Remote connection or ‘Pass-through’ • When you join tables that are all on your server, joins are made rapidly. SQL RESULT Just Passing Through Remote connection or ‘Pass-through’ • When you join tables that are in both places, joins are made s l o w l y SQL SQL FCP All join processing initially produces a full "Cartesian product". Just Passing Through ‘Pass-through’ solution #1 • To the greatest extent possible, define your first table as a select statement that uses tables already on the server. SQL RESULT Just Passing Through ‘Pass-through’ solution #1 SELECT test_score1 , test_score2 , test_score1 + test_score2 as totsat, r.person_key FROM test_slot L right join admissions_application R on l.person_key = r.person_key WHERE r.term = 200510 and r.level = ‘UG’ Just Passing Through ‘Pass-through’ solution #2 • Ask your DBA for table space on your server to transfer cohort files for joins SQL RESULT Just Passing Through ‘Pass-through’ solution #3 select test_score1 as verbal, test_score2 as math, person_key from test_slot where person_key in (‘238540’, ‘342453’ ‘223472’, ‘349432’, ‘546326’, ‘743324’) • Most systems will allow 800~1000 values in an ‘IN’ statement before balking. Just Passing Through Pass-through Awareness • Proprietary RDBMS implementations of SQL can differ in the way they store and compare values, especially dates and null values. If you know there are values which should join, and they do not, check for ‘store and compare’ issues. • For example, Oracle-based date values and operator symbols are different from those used in SAS and MS_Access. Just Passing Through Pass-through awareness • Where to_date(controlling_date,'DD-MON-RRRR') = to_date('19-MAR-2001','DD-MON-RRRR') • Where control_date LE ‘19-MAR-2001’ • = <= >= <> • Where stu_id in (‘23453’ ‘64354’ ‘43564’) • Where stu_id in (‘23453’,‘64354’,‘43564’) Just Passing Through SAS Pass-through Code proc sql; connect to odbc as mydb (datasrc="ODSP" user=esdav2 password=xyz); create table person as select * from connection to mydb ( select r.person_uid, tax_id, full_name_lfmi, birth_date, nation_of_citizenship, visa_type_desc from person l right join admissions_application r on l.person_uid = r.person_uid where r.academic_period = 200510 ); quit; The power of sub-queries The IR professional’s need: • Using sub-queries reduces processing time by reducing the number of rows available to join in the main join. • The Effective-Dated Row concept in many RDBMS tables make using a sub-query an effective way to select only valid rows. • Using sub-queries may help you keep selections on the server side of a pass-through connection. The power of sub-queries • AKA ‘inner query’ – all you need is a pair of parentheses! • A sub-query is just a select query expression that is nested within another query expression. • They are particularly useful when forming cohorts for study, where part of the cohort criteria is in a separate table or tables from the main data of interest. They can return a single value or multiple values to the main query. • Often used in WHERE clauses and HAVING expressions, but can occur just about anywhere. Start slow and build on the use of non-correlated sub-queries. Develop a small sample of your large tables, and test your sub-query there. Then you can move on to correlated sub-queries. The logic of sub-queries SELECT L.key , L.status , R.gender FROM Student L , Demogr R WHERE L.key = R.key and L.term = 200410 and R.visa = ‘F1’ L R JOIN The logic of sub-queries SELECT L.key , L.status , R.gender FROM Student L , (SUBQUERY from Demogr) R WHERE L.key = R.key and L.term = 200410 and R.visa = ‘F1’ L R JOIN The logic of sub-queries SELECT L.key , L.status , R.gender FROM Student L , (SUBQUERY from Demogr) R WHERE L.key = R.key and L.term = 200410 R R L JOIN The logic of sub-queries SELECT L.key , L.status , R.gender FROM Student L , (SELECT key, gender from Demogr WHERE visa = ‘F1’ ) R WHERE L.key = R.key and L.term = 200410 R L Notice that the subquery took the place of a table… JOIN The power of sub-queries SELECT s.person_key, m.* FROM demographics m WHERE m.person_key = ( SELECT person_key from registration where academic_period = 200420 and registration_ind = ‘Y’ ) s and m.legacy_status in (‘M’, ‘F’, ‘P’) Notice that this sub-query took the place of a where criteria equality… The power of sub-queries select a.fac_id , (a.age + a.yos) as retire_sum , b.dept from faculty a , faculty_depts b where a.fac_id = (select distinct academicfac.fac_id from academicfac where academicfac.serv_award = ‘Dist’) and a.fac_id = b.fac_id ; The sub-query first builds a table of one column, faculty ids, which belong to faculty who have won at least one Distinguished Service award. Then this sub-group joined to the first table, which is then joined to the department table. The power of sub-queries Selecting effective-dated data • Effective-dated rows occur in many places in modern databases, like address tables, term-domicile tables, and course distribution requirement tables. • An ED table contains one or more timedated rows per key variable, with a single valid (active) row at a given time. There may be rows with future time dates, and not all time-periods have a separate row! The power of CORRELATED sub-queries Effective-dated CORRELATED sub-query • SELECT z.identity_key, A.student _status_desc, FROM identity z, student A WHERE z.identity_key = A.student _key and A.student _term_eff = ( SELECT max(b. student _term_eff) FROM student B WHERE B.student _key = A.student _key and B.student_term_eff <= 200510 ) Notice that this sub-query is a join equality in the where statement… Jumping Off to Sub-queries Recap of Concept Bridging Topics 1. 2. 3. 4. 5. 6. 7. 8. Terminology is the primary key Lose your mind in a single query Conditions are optimal Groups, having, functions Cohorts versus survivors Keys to the kingdom Just passing through The power of sub-queries SQL for IR Staff (You can do it…We hope we helped! ©) Thanks for attending. This presentation is available after 10/25/2008 at: http://web.wm.edu/ir/conferencepres.html edavies@wm.edu Evan Davies dennis.catley@vt.edu Dennis Catley The College of William and Mary Chartered 1693 SAS is a registered trademark of SAS Institute Inc, Cary, NC 27513
© Copyright 2025