Tilastotiede Tilastotiede ei ole vain numeroita taulukossa tai graafeja paperilla! Tilastotiede on yhteiskunnan, teollisuuden ja tieteen keino hallita epävarmuutta ja tehdä löydöksiä! Costello et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nature Biotechnology, 2014. Lähde: Morningstar Stock Report, morningstar.fi The spatial patterns of the four leading interannual components extracted from climate data. A. Ilin, H. Valpola and E. Oja. Exploratory Analysis of Climate Data Using Source Separation Methods. Neural Networks, 19(2):155-167, 2006. ? ? ? ? José Caldas, Nils Gehlenborg, Ali Faisal, Alvis Brazma, and Samuel Kaski. Probabilistic retrieval and visualization of biologically relevant microarray experiments. Bioinformatics, 25:i145–i153, 2009. Jaakko Peltonen and Samuel Kaski. Generative Modeling for Maximizing Precision and Recall in Information Visualization. In Geoffrey Gordon, David Dunson, and Miroslav Dudik, eds., Proceedings of AISTATS 2011, the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP, vol. 15, 2011. TILASTOTIEDE Tilastotieteen Tilastotieteen juuret juuret ovat ovat todennäköisyysteoriassa, todennäköisyysteoriassa, joka joka alkoi alkoi sattumaa sattumaa käyttävien käyttävien pelien pelien tutkimuksesta. tutkimuksesta. TILASTOTIEDE Mittausten ja tilastojen taitamaton käyttö voi saada aikaan vääriä ja harhaanjohtavia päätelmiä TILASTOTIEDE Tilastotiede on monipuolista data-analyysiä sisältäen sattuman ja vaihtelun hallintaa, informaation suodattamista datasta sekä mallintamista. Tilastotieteellä on läheinen yhteys tiedonlouhintaan ja koneoppimiseen. Tärkeä nykysuunta on laskennallinen tilastotiede, jossa haetaan aineistoista kiinnostavia epälineaarisia piirteitä ja ratkaistaan monimutkaisia malleja mm. kehittyneen ja hajautetun optimoinnin ja laskennan voimin. Opetuksemme perehdyttää keskeiseen teoriaan, tärkeimpiin aineistonhankinta- ja analyysimenetelmiin sekä näiden tietokonepohjaiseen soveltamiseen. Jakaumia, ennustamista, hypoteesin testausta, aikasarja-analyysia, monimuuttujamenetelmiä, tiedon visualisointia, monista lähteistä oppimista... Oakland A's GM Billy Beane is handicapped with the lowest salary constraint in baseball. If he ever wants to win the World Series, Billy must find a competitive advantage. Billy is about to turn baseball on its ear when he uses statistical data to analyze and place value on the players he picks for the team. "geek-stats book turned into a movie with a lot of heart" "persuasively exposed front office tension between ... old school "eye-balling" of players and newer models of datadriven statistical analysis” Texts from IMDB, Wikipedia Carl Carl Friedrich Friedrich Gauss Gauss s. s. 1777 1777 Blaise Blaise Pascal Pascal s. s. 1623 1623 Thomas Thomas Bayes Bayes s. s. 1702 1702 Pierre-Simon Pierre-Simon Laplace Laplace s. s. 1749 1749 Ronald Ronald Fisher Fisher s. s. 1890 1890 Karl Karl Pearson Pearson s. s. 1857 1857 Stephen L. Portnoy Alan Agresti Irene Gijbels University of Illinois Noel Cressie Christian P. Robert Harvey Goldstein Hirotugu Akaike University of FloridaCatholic University Urbana-Champaign Paris Dauphine University Ohio State University of Bristol Institute of of Leuven University Statistical Mathematics Jon A. Wellner Jerome H. Friedman University of Washington The MITRE Corporation Iain M. Johnstone Stanford University Peter Hall University of Melbourne Hira Lal Koul Michigan State University Peter Diggle Lancaster University Dan-Yu Lin University of North Carolina Chapel Hill Gareth O. Roberts David Donoho University of Warwick Stanford University Joseph G. Ibrahim University of North Carolina Chapel Hill James Berger Duke University Donald Rubin Harvard University James Stephen Marron University of North Carolina Chapel Hill Norman R. Draper University of Ingram Olkin Wisconsin Madison Stanford University Jianqing Fan Princeton University Bernard W. Silverman University of Oxford Michael B. Woodroofe University of Michigan Peter J. Rousseeuw University of Antwerp Ole Barndorff-Nielsen Enno Mammen Aarhus University David B. Dunson University of Mannheim Duke University Nancy Reid University of Toronto Kanti V. Mardia University of Leeds Alexandre TsybakovPaul Rosenbaum Marc Hallin CREST & Universite University of Universite Libre Pennsylvania de Bruxelles Paris VI Marc Yor Raymond Carroll Texas A&M University Pierre and Marie Curie University Bruce Lindsay Pennsylvania State University Bradley Efron George Box Stanford University University of Hans-Georg Muller Wisconsin Madison University of Peter J. Bickel Erich Leo Lehmann Alan Gelfand California Davis Murad Taqqu William E. Strawderman David O. Siegmund University of Rutgers, the State Duke University Boston University University of New Jersey Stanford UniversityCalifornia Berkeley University of Wolfgang Karl Härdle California Berkeley Humboldt University of Berlin Peter Buhlmann Ricardo Fraiman ETH Zurich Adrian Raftery Universidad de Andrew Gelman San Andres John W. Tukey Columbia UniversityPersi Diaconis David A. FreedmanUniversity of Buenos Aires Luc Devroye Washington Princeton University Stanford University University of McGill University California Berkeley Robert Tibshirani David Ruppert Peter M. Robinson Standford University Moscow State London School of Pedagogical University Theodore W. Anderson Leo Breiman Economics and Stanford University Holger Dette George Casella Political Science Richard David Gill University of Trevor Hastie Ruhr University Bochum University of Florida California Berkeley Leiden University Stanford University Stephen L. Portnoy Alan Agresti Irene Gijbels University of Illinois Noel Cressie Christian P. Robert Harvey Goldstein Hirotugu Akaike University of FloridaCatholic University Urbana-Champaign Paris Dauphine University Ohio State University of Bristol Institute of of Leuven University Statistical Mathematics Jon A. Wellner Jerome H. Friedman University of Washington The MITRE Corporation Iain M. Johnstone Stanford University Peter Hall University of Melbourne Hira Lal Koul Michigan State University Peter Diggle Lancaster University Dan-Yu Lin University of North Carolina Chapel Hill Gareth O. Roberts David Donoho University of Warwick Stanford University Joseph G. Ibrahim University of North Carolina Chapel Hill James Berger Duke University Donald Rubin Harvard University James Stephen Marron University of North Carolina Chapel Hill Norman R. Draper University of Ingram Olkin Wisconsin Madison Stanford University Jianqing Fan Princeton University Bernard W. Silverman University of Oxford Michael B. Woodroofe University of Michigan Peter J. Rousseeuw University of Antwerp Ole Barndorff-Nielsen Enno Mammen Aarhus University David B. Dunson University of Mannheim Duke University Nancy Reid University of Toronto Kanti V. Mardia University of Leeds Alexandre TsybakovPaul Rosenbaum Marc Hallin CREST & Universite University of Universite Libre Pennsylvania de Bruxelles Paris VI Marc Yor Raymond Carroll Texas A&M University Pierre and Marie Curie University Bruce Lindsay Pennsylvania State University Bradley Efron George Box Stanford University University of Hans-Georg Muller Wisconsin Madison University of Peter J. Bickel Erich Leo Lehmann Alan Gelfand California Davis Murad Taqqu William E. Strawderman David O. Siegmund University of Rutgers, the State Duke University Boston University University of New Jersey Stanford UniversityCalifornia Berkeley University of Wolfgang Karl Härdle California Berkeley Humboldt University of Berlin Peter Buhlmann Ricardo Fraiman ETH Zurich Adrian Raftery Universidad de Andrew Gelman San Andres John W. Tukey Columbia UniversityPersi Diaconis David A. FreedmanUniversity of Buenos Aires Luc Devroye Washington Princeton University Stanford University University of McGill University California Berkeley Robert Tibshirani David Ruppert Peter M. Robinson Standford University Moscow State London School of Pedagogical University Theodore W. Anderson Leo Breiman Economics and Stanford University Holger Dette George Casella Political Science Richard David Gill University of Trevor Hastie Ruhr University Bochum University of Florida California Berkeley Leiden University Stanford University Sinä Tampereen yliopisto TILASTOTIETEILIJÄN TYÖTEHTÄVISTÄ Tilastotieteilijä toimii yhteistyössä muiden alojen asiantuntijoiden kanssa. Sovellusaloja ja joitain tilastotieteen erikoisaloja: ● ● ● ● ● Tekniikka ja luonnontieteet (teknometria, kemometria) Biologia (biometria ks. http://www.uta.fi/hes/tutkimus/tutkimusryhmat/Biometria.html) Lääketiede (epidemiologia) Taloustiede (ekonometria) Yhteiskunta- ja käyttäytymistieteet (demometria, psykometriikka) TILASTOTIETEILIJÄN TYÖTEHTÄVISTÄ Ks. myös http://www.uta.fi/opiskelu/selvitykset/matematiikka_tilastotiede_sijoittuminen.pdf Valinnaiset opinnot voivat vaikuttaa sijoittumiseen tietylle toimialalle. Valinnaisten opintojen valinnasta ks. opinto-opas s. 51 tai http://www10.uta.fi/opas/koulutus.htm?opsId=102&uiLang=fi&lang=fi&lvv=2013&koulid=19 Eräs esimerkki työtehtävistä: http://www.luonnontieteet.fi/tyo/tilastotiede Esimerkkejä työtehtävistä ja työnantajista http://www.uta.fi/rekrytointi/opiskelijalle_ja_tyonhakijalle/uraseuranta/oppiainekoosteet/tilastotiede.html TILASTOTIETEILIJÄN TYÖTEHTÄVISTÄ, VALMISTUNEIDEN MIELIPITEITÄ Tampereen yliopiston Ura- ja rekrytointipalvelu http://www.uta.fi/rekrytointi tekee valmistuneiden työelämään sijoittumisseurantaa http://www.uta.fi/opiskelu/tyoelama/seurannat/index.html Tuorein vuonna 2011 maisterin tutkinnosta valmistuneet http://www.uta.fi/opiskelu/tyoelama/seurannat/maisterit/index/sijoittumisseuranta%202011.pdf (1v valmistumisesta kaikki tilastotieteen opiskelijat olivat vakituisessa tai määräaikaisessa työssä tai apurahatutkijana) Matematiikkaa ja tilastotiedettä opiskelleiden kertomuksia opinnoista ja työelämään sijoittumisesta, Opinto- ja kansainvälisten asiain osaston julkaisu Loogista päättelyä ja tiedon analysointia http://www.uta.fi/opiskelu/selvitykset/matematiikka_tilastotiede_sijoi ttuminen.pdf “tutkijana valtion tutkimuslaitoksessa”, “matemaatikkona valtiollisen viraston tutkimusyksikössä”, “Konsernin laatupäällikkö”, “Data Mining -analyytikko” CB DA International Master's Degree Programme in Computational Big Data Analytics Make BIG SENSE BIG DATA out of Lawyers Are Turning to Big Data Analysis (The National Law Journal, Big data for big business - analytics are no longer optional (The Globe and July 2015) Mail, August 2015) Intel Unveils Analytics Technologies for Big Data, IoT (eWeek, August 2015) Put big data to work with Cortana Analytics (TechRepublic, July 2015) How the age of Big Data made statistics the hottest job around (Canadian Business, April 2015) What can big data do for small startups? (VentureBurn, August 2015) Why big data isn't always the answer (ComputerWorld, August 2015) Data Scientist: The Sexiest Job of the 21st Century (Harvard Business Review, October 2012) Making Sense of Our Big Data World: Statistics for the 99% (Business 2 'Big data' useful but caution is still Community, August 2015) needed (Daily Record, August 2015) Growth in big data draws women to statistics (FWC.com, How To Identify A Good/Bad Data February 2015) Scientist In A Job Interview? Why your kids will want to be data (LinkedIn, August 2015) scientists (CNBC, June 2014) CB International Master's Degree Programme in DA Computational Big Data Analytics Tilastotieteen opinnot CBDA-ohjelmassa: Suurissa tietoaineistoissa tapahtuu monenlaista variaatiota. Osaamista tarvitaan jotta pelkistä mittauksista päästään malleihin ja ymmärrykseen. On vaikea tietää pelkästään katsomalla mitkä mahdolliset trendit ovat ”todellisia” ja mitkä ovat vain yhteensattumaa. Tietokoneet pystyvät etsimään mahdollisia trendejä suurista joukoista vaihtoehtoja, mutta niille täytyy kertoa kuinka arvioida löydöksien hyvyyttä. CBDA:n tilastotieteen opinnot kertovat: ● millaisia tilastotieteellisiä struktuureja ja trendejä voisi etsiä ● miten mitata ovatko ne ”todellisia” ● apukeinoja niiden etsimiseen ja tulosten esittämiseen Master's programme in Computational Big Data Analytics (CBDA) General Studies in Master's Degree Programmes given in English 2015-18 1–22 ECTS General studies in the Master's degree programmes given in English are different depending on the student's educational background. Please choose below only one of the three options A, B or C. A) General studies for international students 12–22 cr Compulsory studies 12 cr ● SISYY006 Orientation, 2 cr ● SISYY005 Study Skills and Personal Study Planning, 2 cr ● KKENMP3 Scientific Writing, 5 cr ● KKSU1 Finnish Elementary Course 1, 3 cr Free-choice studies 0–10 cr ● YKYYKV1 Finnish Society and Culture, 3–5 cr ● YKYYV07 Introduction to Science and Research, 2–5 cr B) General studies for students with education in Finnish and BSc degree taken outside SIS 9–18 cr Compulsory studies 9–13 cr Swedish course is required only if no Swedish studies were taken in the Bachelor's degree. ● SISYY006 Orientation, 2 cr ● SISYY005 Study Skills and Personal Study Planning, 2 cr ● KKENMP3 Scientific Writing, 5 cr ● KKRULUK Ruotsin kielen kirjallinen ja suullinen viestintä, 4 cr Free-choice studies 0–5 cr ● YKYYV07 Introduction to Science and Research, 2–5 cr C) General studies for students who have taken their BSc degree at SIS 1–11 cr Compulsory studies 1 cr Basics of Information Literacy 1 cr is not required, only Personal study planning 1 cr from SISYY005. ● SISYY005 Study Skills and Personal Study Planning, 2 cr Free-choice studies 0–10 cr Scientific Writing is recommended if the Master's thesis is written in English. ● KKENMP3 Scientific Writing, 5 cr ● YKYYV07 Introduction to Science and Research, 2–5 cr Master's programme in Computational Big Data Analytics (CBDA) Advanced Studies in Big Data Analytics 85 cr Compulsory Advanced Courses in Big Data Analytics 50 cr ● MTTTS11 Master's Seminar and Thesis, 40 cr ● MTTTS12 Introduction to Bayesian Analysis 1, 5 cr ● TIETS01 Algorithms, 5 cr Advanced Courses in Methods of Computational DataAnalytics 15– cr ● TIETS07 Neurocomputing, 5 cr ● TIETS11 Data Mining, 5 cr ● TIETS31 Knowledge Discovery, 5–10 cr ● TIETS39 Machine Learning Algorithms, 5 cr ● TIETS33 Advanced Course in Computer Science, 1–10 cr Advanced Courses in Methods of Statistical Data-Analytics 20– cr ● MTTTS13 Introduction to Bayesian Analysis 2, 5 cr ● MTTTS14 Statistical Modeling 1, 5 cr ● MTTTS15 Statistical Modeling 2, 5 cr ● MTTTS16 Learning from Multiple Sources, 5 cr ● MTTTS17 Dimensionality Reduction and Visualization, 5 cr ● MTTTS18 Time Series Analysis 1, 5 cr ● MTTTS19 Advanced Regression Methods, 5 cr ● MTTTS21 Statistical Inference 2, 5 cr ● MTTS1 Other course (advanced) Master's programme in Computational Big Data Analytics (CBDA) Other and optional Studies in Big Data Analytics Programme 13–29 cr Compulsory Introductory Studies 5 cr ● TIETA17 Introduction to Big Data Processing, 5 cr Complementing Studies Optional Studies Complementing studies determined based on previous education Recommended studies in Applications of Data-Analytics ● TIETS05 Digital Image Processing, 5 cr ● MTTTS20 Basics of Financial Data-Analysis and Risk Theory, 5 cr ● ITIS13 Information retrieval methods, 5 cr ● ITIS16 Information practices literature, 5–20 cr ● MTTA3 Internship, 2–10 cr CBDA Courses Fall 2015 I: Introduction to Bayesian Analysis 1 I: Introduction to Big Data Processing I-II: Learning from Multiple Sources I-IV: Information practices literature Prior and posterior distributions, Bayes estimators, posterior predictive distribution, interval estimation and hypothesis testing, single-parameter models, simple multiparameter models. Data fusion, transfer learning, multitask learning, multiview learning, and learning under covariate shift II: Time Series Analysis 1 Simple time series models, stationary time series models (ARMA), nonstationary and seasonal time series models (SARIMA), time series regression, periodogram. (Master's thesis and seminar runs every fall and spring.) Typical characteristics and common applications of big data; basics of distributed file systems, databases and computing; practical data processing skills with MapReduce / Apache Hadoop Literature package on either: Information practices; Information retrieval systems; Interactive information retrieval; task-based information retrieval I-II: Knowledge Discovery phases of the process of knowledge discovery and its nature; basic data prepocessing, data mining and postprocessing tasks and methods; application in practical knowledge discovery tasks; advanced methods in knowledge discovery; data management issues CBDA Courses Spring 2016 III: Introduction to Bayesian Analysis 2 III: Data Mining Markov chains, MCMC methods, model checking and comparison, commonly used statistical models, such as hierarchical and regression models, binomial and count data models. III-IV: Dimensionality Reduction and Visualization premises, objectives, relevance, and basic methods of data mining; properties of data and measurements, preprocessing methods, some data mining algorithms and their applications, for instance, for classification and prediction of data. I-IV: Information practices literature Properties of high-dim data; Feature Selection; Linear feature extraction; Graphical excellence; Human perception; Nonlinear dimensionality reduction; Neighbor embedding methods; Graph visualization. Literature package on either: Information practices; Information retrieval systems; Interactive information retrieval; task-based information retrieval IV: Statistical Inference 2 basic and advanced machine learning methods for data mining, pattern recognition and other problems Roles of Modeling in Statistical Inference, Principles of Data Reduction, Estimation: Risk, Loss of estimators,... Large sample properties Likelihood-Based Methods, likelihood-based tests and confidence regions IV: Machine Learning Algorithms CBDA Statistics Courses Fall 2016 (preliminary!) Spring 2017 (preliminary!) I: Introduction to Bayesian Analysis 1 III: Statistical Modeling 1 I-II: Learning from Multiple Sources III-IV: Dimensionality Reduction and Visualization Prior and posterior distributions, Bayes estimators, posterior predictive distribution, interval estimation and hypothesis testing, single-parameter models, simple multiparameter models. Data fusion, transfer learning, multitask learning, multiview learning, and learning under covariate shift II: Possibly ”Basics of financial data analysis and risk theory 5cr”, or another course Multinomial and ordinal regression, nonlinear regression, parametric survival analysis, counting process models, semiparametric hazard models. Properties of high-dim data; Feature Selection; Linear feature extraction; Graphical excellence; Human perception; Nonlinear dimensionality reduction; Neighbor embedding methods; Graph visualization. IV: Statistical Modeling 2 Normal mixed model and extensions, growth curve models, models for panel discrete (binary,count, categorical) observations, analysis of missing data, mixture or latent class regression, hierarchical and latent structure models Tilastotiede on tiedon ja epävarmuuden hallintaa. Niin kauan kuin maailmassa on epävarmuutta, on tarvetta tilastotieteelle.
© Copyright 2025