Information Visualization Talk Information Visualization – a talk by Prof. G Benoît March 4, 2014, Simmons College Volume of data Popularization of the Topic Creating Tools Powerful computers Our work lives … [SLIDE 1] Welcome. I call this talk the “future of information” for several reasons. • One is that we live already in a graphically-intense world. The typical American sees more than 3500 visual messages a day; • The Web has been around for a generation - students come to school with computers, software, and the Internet as part of the topography of their lives not as new skills and modes of thought to be learned; • The topic is popular - public awareness by people creating their own graphics, more graphics created by designers are accessible - the web is full of examples as are the streets; popular press journals promulgate the idea of Big Data as a benefit1 (without real analysis of the liabilities); • Strata and Ted2 talks online, along with people appropriating the term for their own blogs and websites increases the number of avenues towards the subject; • More raw data being designed blurring the traditional lines of “information graphics”, “graphic design”, “data visualization”, “information visualization”, and now Big Data, Data Analytics, Visual Analytics, decision making; Literatures and Definitions: Let’s start with some definitions: “Information visualization is the study of (interactive) visual representations of abstract data to reinforce human cognition. The abstract data include both numerical and non-numerical data, such as text and geographic information.” This is pretty accurate. And a cursory review of literatures from different times and domains suggest commonalities of interpretation but also differences: should the visualization be representations of abstract phenomena (such as “ideas”) or stimulate subconscious mental processes [cognition], or underscore the volume of data that could be related, or rely on a few design principles [Ilinsky’s YouTube introduction3], expand to incorporate new flexible displays, or build on what is already expected? Do some data lend themselves to certain design models? Time and Space are two usual criteria. What kind of rhetoric captures the goals of visualization? From the designers’ perspective, the choice of expression, audience, and data stores dictate the process; to others there’s historical roots of data mining + visualization that are integral to the visualization and interpretation; others emphasize the large volume of data [and that leads to outlandishly complex displays]; others on the end-user’s interpretive strategies; and finally, some “hard-core” computer scientists limit the whole to the computational aspects. 1. Harvard Magazine. (2014, Mar-Apr). Making sense of big data. 2. e.g., http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html 3. https://www.youtube.com/watch?v=nrsdgvauqKg –1– Information Visualization Talk A functional definition might be extracted by who engages in InfoVis: computer science - from the more technical to the more applications of InfoVis: popular (public) items - http://www.infovis-wiki.net/index.php/Information_Visualization 1999: Use of computer-supported, interactive, visualization of abstract data to amplify cognition [Card, et al.] Information visualization utilizes [sic] computer graphics and interaction to assist humans in solving problems [Purchase et al., 2008, p. 58. Terrible writing] “Info Vis… is a special kind of visualization. Visualization is a part of computer graphics, which is in turn a subset of computer science … Info Vis is visualization of abstract data… [s]hould be seen in contract to scientific visualization, which deals with physically-based data … Visualization of abstract data is not straightforward …” [Voigt, 2002] In information visualization, the graphical models may represent abstract concepts and relationships that do not necessarily have a counterpart in the physical world, e.g., information describing user accesses to pages of an Internet portal or records describing selected properties of different car brands and models. Typically, each data unity describes multiple related attributes (usually more than four) that are not of a spatial or temporal nature. Although spatial and temporal attributes may occur, the data exists in an abstract (conceptual) data space [Ferreira and Levkowitz, 2003]. The study of how to effectively present information visually. Much of the work in this field focuses on creating innovative graphical displays for complicated datasets, such as census results, scientific data, and databases. An example problem would be deciding how to display the pages on a website or the files on a hard disk. Visualization techniques include selective hiding of data, layering data, taking advantage of 3-dimensional space, using scaling techniques to provide more space for more important information (e.g. Fisheye views), and taking advantage of psychological principles of layout, such as proximity, alignment, and shared visual properties (e.g. color) [Usability First, 2003]. Information visualization, an increasingly important subdiscipline within HCI, focuses on graphical mechanisms designed to show the structure of information and improve the cost of access to large data repositories. In printed form, information visualization has included the display of numerical data (e.g., bar charts, plot charts, pie charts), combinatorial relations (e.g., drawings of graphs), and geographic data (e.g., encoded maps). Computer-based systems, such as the information visualizer and dynamic queries have added interactivity and new visualization techniques (e.g., 3D, animation) [Averbuch, 2004]. Visual representations of the semantics, or meaning, of information. In contrast to scientific visualization, information visualization typically deals with nonnumeric, nonspatial, and high-dimensional data [Chen, 2005] ** The upshot is “Information Visualization” is the graphic rendering of abstract phenomena; that the rendering shares computational approaches and visual languages with other areas, such as graphic design, information graphics, and illustration; that the definitions of information visualization vary by domain, history of computing use, history of applying graphics; with very little reading of others’ literatures. What is involved in general? Sketching what you want to see - what underlying model of design? Do you draw from graphic design principles? Or from computing models? Is your goal to display all the data, establishing combinations of extracted data based on some design or domain-specific needs? • Do you use only the raw data or do you add extra-textual (metadata) as well as adding visual clues, such as “real-world metaphors” that situate the viewer? • Do you have to know how to program or do you use third-party software? • Is there room for you to participate if you program your own designs or if you can’t program at all? The Process First we need a reason: why are these data going to be displayed? Commonly people design to express a lot of data to be viewed; then interpreted; often for some specific need, such as decision-making, decision-support, or –2– Information Visualization Talk uncovering unanticipated relationships in the data, discoverable by visual means. • Explain - situate the viewer to understand; provide statistical evidence for the reason for the links; collapse multiple layers and dimensions of data into a single, computer-based image • Leads to integration of graphic design; simplified computing; emphasizes the idea of (usually semantic tokens) links or relationships • Predict - depending on the underlying model, predict why something happened, or didn’t; predict how something might change by altering some inputs; • hypothesis generation; hypothesis testing for decision-making • risk analysis, but obviously anything that is a concern for the domain expert group. • Expose undiscovered - show the immediately obvious represented by a graphic difference or exposing visual representations of the cause-and-effect of changing inputs The Data The data component is similar to the processes of “KDD” or Data Mining and Text Mining. A large volume of data are extracted often from heterogeneous data sets and merged into a data warehouse. The warehouse consists of prepared data (“cleansed data”) and then subjected to algorithms that find some relationships between the data. This is vital: the reason for the relationships usually is bound to the domain: medical doctors may need links between biomedical processes, expressed as terms extracted from the collection. For example, “myocardial infarction” may be the text expression of data from one collection (one dimension of the data); combine this with patients’ records about treatments (say prescriptions), but then add another dimension such as recovery time, and perhaps drug interactions … what might happen? Perhaps an unanticipated link between a drug, something unknown in a patient record, and the strength of the prescription reveals a greater longevity. So here the most obvious use of visualization of scientific data is to expose the otherwise abstract notion of recovering from a heart attack. The Display The data themselves are separate from the presentation of the data; and separated still is the idea of interacting with the data. Only recently, the “hard core” computer science folk are beginning to look at the graphic design of the data. This idea is supported by changes in course curricula and activities across graduate courses in information visualization. The shift is noticeably towards the static visual representation of a complex problem, called “information graphics.” This is counter-intuitive because the static imagery of data seems to derive from explanatory texts and arts (such as the 18th century French Enlightenment minds d’Alembert and Diderot’s Encyclopédie) to illustration (emphatically not fine art, where the concept of representation versus abstraction took a different turn); computing has always been about programmatic, that is algorithmic, solutions to human problems. We see, too, in the computer science literature at times ignorance of, or avoidance of, the graphic design literature or attempts to create from scratch what designers have studied and applied for years4. Good examples are found in the industry standard InformationVisualization conference and journal. 4. http://simile-widgets.org/timeline/ –3– Information Visualization Talk Another trend is to popularize5 the fusion of arts with data. We see this in trends such as ArtScience, a ridiculous fusion, but with grant-funded fury, by David Edwards at Harvard; or the way MIT allows students to design the official website, changing the feeling and look daily, provided certain features are provided; or how Mitsubishi incorporates visual story-telling in product development. But there is more to consider: for example the study of visualizations (from information graphics) have become a publishing trend, such as McCandless’ Visual miscellaneum: transit maps of the world. Or that we are here today discussing the topic. Data, Models, Tools The key here is that we data standards and models (sql, xml, flat files), algorithms and models for extracting data (parsing), techniques for creating combinations among large data sets that may be meaningful individually and in the aggregate as well as computing tools suited for all levels of participation6. For example, we could design something using Adobe Illustrator; equally we could sketch our own visualizations for the web using HTML4 or 5, easy-to-use 3rd party JavaScript drawing libraries (such as raphael.js7; Chart.js8, processing.js9, d3js10), or, programmatically incorporating easy scripts such as PHP (to access flat file, xml, and relational database stores and to draw common plots, e.g., jpgraph11) and HTML5’s Canvas or incorporating Scalable Vector Graphics (.svg; see for example the JavaScript + SVG tiled maps from PolyMaps12) files we adopt from Illustrator or create on-the-fly, again incorporating (usually free) tools to plot what one expects to see, given the baseline (such as time; e.g., timeline widgets) … or go further using high-powered libraries and standards, such as Java3D, JavaFX13 to create our own tools - or to use a host of OpenSource and proprietary tools. Notice that there are a lot of OpenSource tools - but frankly with some effort on your own part you could create cool stuff, too - and there are a lot of proprietary tools that expect (require) the “Cloud” (e.g., iCharts14- a horrible idea. [Fight the Cloud with all your being because it is the Borg!] Keep in mind that what the public and managers see as something new - Big Data and Visualization - has been around since 1946 when American Demographics published an edition with a graphic on the cover - to demonstrate to statisticians the idea that data could be represented as a visual, instead of lists of numbers. And quickly to this mix we add companies, such as IBM15, SPSS, and SAS, who had either engineers or statisticians willing and able to help domain experts (car sales, medical studies, economists, even the FBI) ingest incomprehensibly large data sets to extract interesting events, or combinations of data that suggest to the domain-expert an important trend or a significant anomaly. To increase the confidence of one’s interpretation, these activities provided statistical analyses. To be sure, there are times accidents of data sets happen (called “lift”) and the domain-expert must intervene where “events” are identified that could never happen. Today the trend continues with OpenSource 5. http://www.webdesignerdepot.com/2009/06/50-great-examples-of-data-visualization/ 6. Created from looking at a lot of sites (such as https://www.cmu.edu/teaching/technology/tools/informationvisualization/, http://www.creativebloq.com/infographic/20-freedata-visualisation-tools-5133780, ) to identify some trends. 7. http://raphaeljs.com 8. http://www.chartjs.org 9. http://processingjs.org 10. http://d3js.org 11. http://jpgraph.net 12. http://polymaps.org 13. http://www.oracle.com/technetwork/java/javafx/overview/index.html 14. http://www.icharts.net 15. For instance “ManyEyes” http://www-958.ibm.com/software/data/cognos/manyeyes/ –4– Information Visualization Talk statistics packages such as SQL and R16, integration of statistical products like R with vendor software (such as tableau17 or with SQL as “NOSQL” proprietary scripting languages such as Cypher18), and the results of high-powered schools’ student activities (for instance, MIT’s C-Sail projects and Harvard’s CS-50 course). Old wine in new skins A voice that’s not usually heard says that “Big Data” and “Information Visualization” are really, by their own definitions, a continuation of information retrieval (IR) [establishing relationships between document collection representation, query representation, a framework for their matching, and a relationship between queries and documents] in order to locate (“known-entity search”) and learn (formerly “browsing”, today “discovery”), interactivity, cognition (or meaning-construction), but with an emphasis on the language of the relationship: from textual, 2-dimensional representations [lists of relevancy ranked items] to multidimensional, multi-layered representations using a visual language. To this standard model19, I add V for the visual component. We might represent this as Q, D, F, R(qi, dj) + V. Some Questions • If, on the one hand, the InfoVis trend is merely a popularization of data mining activities, what is its relationship to established fields and research? How do IR and DM map to IV? • On the other hand, if the world of “Big Data” takes over, that is volumes of data so vast they outpace human comprehension and so require so treatment for our understanding, how will how we think, communicate, evaluate, and prepare records all be transformed? Have they already been transformed? Treading lightly? There have been many attempts to visualization data from libraries, archives, museums, and the like. And there are trends to visualize purchasing habits (look at Amazon’s “people who bought X also looked at…” - these are called “recommender systems”); just as there are tools, projects, and trends common in established research and practice domains, of which geographic information systems20, chemistry21, and math are obvious examples. Notice, too, that many “informatics” programs do the same work, with the same tools, resources, and visualizations, bioinformatics and health informatics leap to mind. But … we should consider, too, the influences of technological change, innovation, and adoption. Just how much is imposed from without? What situations coerce and require us to yield rights to participate in a highly digitized world - and can we stand up against them? That what constitutes the sanction approach to information and data - driven from outside, and drive by volume, not necessarily utility, obviating work domains - a scythe mowing down all before it. Some established trends 16. http://www.r-project.org 17. http://www.tableausoftware.com/products 18. http://www.neo4j.org 19. See Baeza-Yates (1998) Modern information retrieval. New York: ACM Press and class notes for LIS466, Information Retrieval (web.simmons.edu/~benoit/lis466/ index.html). 20. Any GIS application; similar commercial products Gapminder.com; even Google Earth 21. JMOL (3d models of chemical structures) –5– Information Visualization Talk The counter and opportunity is knowledge. But without genuine communication there can be no discussion about opportunities, liabilities, benefits, or learning more. Therefore, what should people in “information professions” do? 1. Learn about graphic design: know the widely-adopted principles of composition, typography, and color theory 2. Learn about the history of visuals: mass communication [posters, advertisements, television]; “fine art” versus “low art” a. Both of these ideas are easily accessible in standard texts, such as Meggs’ or Janson; but even more so through companies with vested interests in an informed customer base, such as Adobe [cite] b. But you need to be an informed consumer, too; many sites are inaccurate or have a limited scope 3. Learn more about data models and how they’re manipulated a. relational databases b. XML c. full-text retrieval (IR) d. trends in Data Mining, Text Mining, and the informatics movements e. Boolean, extended Boolean, algebraic, probabilistic models - these are important in OPACs 4. Learn about fundamental issues related to the adoption of innovation and systems design a. Examine how your institution makes decisions about enterprise-wide information systems b. Can you argue for other models, say a “data centric” model versus the current “add another portal” model? 5. Programming, scripting a. Understand the relationship at a tactile level between extracting data, creating meaningful subsets, and then translating this whole into a visual language 6. Master a few of the popular 3rd party tools and literatures a. There are many OpenSource and proprietary software products b. Find a literature that suits your level and needs: perhaps the technical industry’s Information Visualization or ACM SIG-VIS; or AMIA’s or Pacific Symposium of Biocomputing’s research; perhaps more popularizing trends such as a university or library system publication (such as Harvard Magazine); or something in-between So how is IV your future? The theme of the talk is that information visualization is your future. How? Information systems in general do not emerge from the bottom-up; usually they’re usually imposed from beyond. While originally librarians were full participants in the creating of forward-looking data models (MARC was very far-sighted for its day), the trend, for myriad reasons to complex for today’s talk, has been to centralize - at first into OPACs (RLIN, OCLC, etc.), then to integrated services (multiple otherwise independent information systems linked either at the data-level or through the interface as a portal); now to multiple silos of data supplied by 3rd party vendors that require additional staff and systems to link them … finally, to the idea of large, astoundingly large, sets of heterogeneous data that can be extracted and combinations made that could be meaningful to the user. The volume is so great that traditional lists are not sufficient; there needs to be other means to express the data –6– Information Visualization Talk themselves, the links between the data, and to do so in a visual language. There needs to be people who (a) create these systems, (b) understand these systems and can explain the benefits, liabilities and use of these in new settings, and how to evaluate their usefulness, and (d) who can be valuable contributors to the design and integration of new visualization systems. Consider this transition: a spreadsheet program that has pie charts, bar graphs and the like. We grew up with this as an everyday tool. Now consider relational databases. FileMaker Pro is commonly employed in offices by staff; MySQL and Oracle are used in small offices settings through large businesses, increasingly with visualization tools - still bound, tho, to the idea of pie charts and graphs. Notice, then, the ubiquitousness of these products and tools … so why InfoVis and Big Data causing a stir? The reason is the shift towards analysis of the data - the skills of analysis are increasingly statistics-based or numeric. Quantification of data and its domination in work and study have shifted what passes for “legitimate knowledge” to arguably only quantified empiricist base. The humane, qualitative, discursive are, some believe, an avenue to add value. Visualization provides also a chance to participate as a designer; imagine developing alternatives to the FishEye or building a system that demonstrates usefulness to your clients/patron-base? Conclusions? The field is wide-open and growing in popularity. It’s a combination of well-defined behaviors but with a lack of stability in principles, tho some are gelling. I don’t think they’re quite right, tho … There are ways to learn more about the topic, the specific activities, and inspiration… The rest of these slides is a gallery of visual solutions, often drawing from the same dataset… Visit the Information Visualization class, LIS593d. Your ideas and questions are welcome; do you want to participate on some info vis projects? Let me know - I have the resources but not the folk! Thanks. Gerry Benoit, benoit@simmons.edu http://web.simmons.edu/~benoit/index.html –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– References Averbuch, M. (2004). As you Like It: Tailorable Information Visualization, Database Visualization Research Group, Tufts University. Card, S., Mackinlay, J., & Shneiderman, B. (1999). Readings in Information Visualization: Using Vision to Think, Morgan Kaufmann Publishers. Chen, C. (2005). ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=31454&arnumber=1463074&count=14&index=3 Top 10 Unsolved Information Visualization Problems], IEEE Computer Graphics and Applications, 25(4):12-16, July-Aug. 2005. Ferreira de Oliveira, M.D., & Levkowitz, H. (2003, Sept). doi.ieeecomputersociety.org/10.1109/TVCG.2003.1207445 From Visual Data Exploration to Visual Data Mining: A Survey], IEEE Transactions onVisualization and Computer Graphics, 9(3), pp. 378-394, July-September, 2003. Gee, A.G.,Yu, M., & Grinstein, G.G. [nd] Dynamic and Interactive Dimensional Anchors for Spring-Based Visualizations. Technical Report, Computer Science, University of Massachussetts Lowell. Keim, D.A., Mansmann, F., Schneidewind, J. & Ziegler, H. (2006). Challenges in Visual Data Analysis, Proceedings of InformationVisualization (IV 2006), IEEE, p. 9-16, 2006. Plaisant, C. (2001, Nov.) InformationVisualization - Lecture Notes, –7– Information Visualization Talk Purchase et al., 2008] Purchase, H. C., Andrienko, N., Jankun-Kelly, T. J., and Ward, M. 2008. Theoretical Foundations of Information Visualization. In InformationVisualization: Human-Centered Issues and Perspectives, A. Kerren, J. T. Stasko, J. Fekete, and C. North, Eds. Lecture Notes In Computer Science, vol. 4950. Springer-Verlag, Berlin, Heidelberg, 46-64. DOI= lololdx.doi.org/10.1007/978-3-540-70956-5_3 Usability First (2003). Usability Glossary. Retrieved at: 2003. www.usabilityfirst.com/glossary/main.cgi?function=display_term&term_id=5 Voigt, R., (2002). www.vrvis.at/via/resources/DA-RVoigt/masterthesis.html An Extended Scatterplot Matrix and Case Studies in Information Visualization], Master’s thesis, Hochschule Magdeburg-Stendal, 2002, www.vrvis.at/vis/resources/DA-RVoigt/node4.html Classification and Definition of Terms] Wikipedia (2005). Information visualization. en.wikipedia.org/wiki/Information_visualization http://www.matthiasdittrich.com/projekte/narratives/visualisation/index.html http://www.wikimindmap.org http://www.datavisualization.fr/blog/2011/12/data-visualization-in-2011-a-recap.html http://nordisk.pp.ru/dizain-menedgment/ file name: InfoVisTalk-GB-2014.rtf 3/3/14, 10:10 AM –8–