class: center, middle, inverse, title-slide # Storytelling with Data Visualization ## Autodesk - technical meeting ###
Anabelle Laurent ### November 30, 2021 --- ### Why is Data Visualization important? 📊 --- ### Why is Data Visualization important? 📊 - **Universal way** to communicate information - Provides clear and **effective message** - Find **patterns, trends, spot extreme values** - Make data **memorable** - Maintain the audience's interest --- ### What make a good visualization? 🤔 --- ### What make a good visualization? 🤔 - Reveals a **trend** or **relationship** between variables - Always have at minimum a **caption**, **axis**, **scales** and **symbols** - Distinct and legible symbols (i.e., use contrast) - Caption should convey as much information as possible - No noise: keep information at minimum - the **correct graph type** based on the kind of data to be presented --- ### Disclaimer This workshop does not provide code but all the plots were made using R Studio (see last slides for more details) <center><img src="images/ggplot2_masterpiece.png" style="width: 70%" /> </center> [Artwork by @allison_horst](https://github.com/allisonhorst/stats-illustrations) --- # Visualizing distribution <center><img src="images/histogram.png" style="width: 70%" /> </center> [Artwork by @allison_horst](https://github.com/allisonhorst/stats-illustrations) --- ### Visualizing distribution : histograms .right-column[ ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-1-1.png)<!-- --> ] .left-column[ For plotting the distribution of a single quantitative variable Try different bin widths for best visual appearance. - Small bin width -> peaky and busy histogram - Large bin width -> features might disappear ] --- ### Visualizing distribution : density plot .right-column[ ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-2-1.png)<!-- --> ] .left-column[ Try different bandwidths for best visual appearance - Small bandwidth -> peaky and busy density - Large bandwidth -> smooth feature and might look like a gaussian ] --- ### Visualizing multiple distributions ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-3-1.png)<!-- --> --- ### Visualizing multiple distributions .right-column[ ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] .left-column[ - The peaks of the density plot are where there is the highest concentration of points - For several distributions, density plots work better than histograms. ] --- ### Visualizing multiple distributions ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- ### Visualizing multiple distributions: ridgeline plot ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- ### Visualizing multiple distributions: ridgeline plot .right-column[ ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] .left-column[ Ridgeline plot shows the distribution of a numeric value for several groups (at least 5-6 groups) or when they overlap each other. ] --- ### Visualizing distributions: boxplot <center><img src="images/read_boxplot.jpeg" style="width: 100%" /> </center> A boxplot can summarize the distribution of a numeric variable for several groups --- ### Visualizing distributions: boxplot .right-column[ ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-8-1.png)<!-- --> ] .left-column[ Boxplot does not tell about the number of observations. ] --- ### Visualizing distributions: boxplot with jitter .right-column[ ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ] .left-column[ Boxplots with jitter tell about: - the distribution of the data - if the groups are balanced or unbalanced in terms of observations. ] --- ### Visualizing distributions: boxplot with jitter .right-column[ ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-10-1.png)<!-- --> ] .left-column[ No overlapping facilitates the visual appearence of the plot ] --- ### Visualizing distributions: violin plot .right-column[ ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-11-1.png)<!-- --> ] .left-column[ - Violins are equivalent to density estimate - They are useful to represent bimodal data. ] --- # Visualizing associations among quantitative variables --- ### Relationship between 2 numeric variables: scatterplot ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-12-1.png)<!-- --> --- ### Relationship between 2 numeric variables: scatterplot + linear fit ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-13-1.png)<!-- --> --- ##### Relationship between 2 numeric variables: scatterplot + quadratic fit ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-14-1.png)<!-- --> ⚠️ Linear fit is widely used but it is not always the best fit, try quadratic fit too. --- ### Relationship between 2 numeric variables: scatterplot ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-15-1.png)<!-- --> --- ### Multi-panel plots ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-16-1.png)<!-- --> Split a single plot using one variable with many levels --- ### Multi-panel plots ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-17-1.png)<!-- --> Split a single plot using the combinations of two discrete variables. --- ### Multi-panel plots ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-18-1.png)<!-- --> ⚠️ different scales can lead to misinterpretation --- ### Bubble plot A bubble plot is a scatterplot with 3 numerical variables ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-19-1.png)<!-- --> --- # Tell a story with your data 📖 --- ### Tell a story with your data Before data visualization you must: - Know your audience - Know the level of data detail expected - Give enough context - Ask yourself: What do I want my audience know/remember with the data I am presenting? --- ### Tell a story with your data .right-column[ <center><img src="images/2020_08_14_penguins.png" style="width: 80%" /> </center> ] .left-column[ Don't be repetitive but be consistent (theme, color scheme, font size etc.) ] --- ### Tell a story with your data Guide your audience by point out specific values <center><img src="images/2019_10_08_powerlifting.png" style="width: 80%" /> </center> --- ### Tell a story with your data Guide your audience by pointing out specific values <center><img src="images/2020_foodconsumption.png" style="width: 70%" /> </center> --- ### Tell a story with your data Customize your plot using highlighting ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-20-1.png)<!-- --> --- ### Tell a story with your data Customize your plot using highlighting + text ![](Autodesk_StoryTelling_files/figure-html/unnamed-chunk-21-1.png)<!-- --> --- ### Interactive graphics with ggplotly
--- ### Data visulization using interactive web-app One case-study [ISOFAST web-app](https://analytics.iasoybeans.com/cool-apps/ISOFAST/) Problem Statement: - Make most use of the cumulative experiment data collected since 2006 - Need data-driven insights (overview at the network level and not only farm level) Development of ISOFAST - Audience: farmers, local agronomists, researchers - Easy-to-navigate user interface - Effective data visualizations - Economic analysis for adaptive decision making --- ### R library used for this presentation ```r library(ggplot2) library(dplyr) library(tidyr) library(gapminder) library(gghighlight) library(ggrepel) library(dygraphs) library(plotly) ``` --- ### Resources to go deeper into Data Viz - [Claus Wilke's book](https://clauswilke.com/dataviz/index.html) - [Rob Kabacoff's book](https://rkabacoff.github.io/datavis/) - [Marie Döbler & Tim Großmann's book](https://www.barnesandnoble.com/w/the-data-visualization-workshop-second-edition-mario-d-bler/1136609407) Available online with ISU Library - [Cédric Scherer's blog](https://www.cedricscherer.com/top/dataviz/) - [From Data to Viz's website](https://www.data-to-viz.com/) - [Plotly R package](https://plotly.com/r/) - [Shiny tutorial](https://shiny.rstudio.com/tutorial/) - Check the hashtag **#tidytuesday** on twitter if you are looking for inspiration & R code. - [Shiny app about Tidy Tuesday tweets](https://nsgrantham.shinyapps.io/tidytuesdayrocks/) --- ### Accurate <center><img src="images/r_rollercoaster.png" style="width: 85%" /> </center> [Artwork by @allison_horst](https://github.com/allisonhorst/stats-illustrations) --- ### Thank you for your attention <center><img src="images/lastslide.jpg" style="width: 60%" /> </center> ✉️ my email: **alaurent@iastate.edu** Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan).