Intro to Data Viz

Lecture 4

Dr. Elijah Meyer

NC State University
ST 511 - Fall 2024

2024-08-28

Checklist

– Are you keeping up with the prepare material?

– Are you posting on Slack?

– Take advantage of TA office hours!

– Quiz-1 released at 5:00pm (on Moodle)

– HW-1 released Thursday afternoon (on Moodle; due on Gradescope)

HW-1

There is a Workflow and Formatting section

– putting your name in the YAML at the top of the document

– Pipes %>%, |> and ggplot layers + should be followed by a new line

– You should be consistent with stylistic choices, e.g. %>% vs |>

We will add more rules once we explore the tidyverse stylings later in the semester!

HW-1

Change in due date!

HW-1 is now due on Sunday Sep 8th at 11:59pm

Some topics on HW-1 we may not cover as in-depth until next week

Goals for today

– Finish summary statistics

– Understand the fundamentals of ggplot

– Build appropriate visualizations

– More practice with R

Warm Up

– What are the variables?

– What patterns / trend can you takeaway from this graph?

What types of plots can we make?

Golden Rule We let the type of variable(s) dictate the appropriate plot

– Quantitative

– Categorical

When we go through how to make graphs in R, we are going to be mindful on the type of variable(s) we are using.

How do we make graphs?

The process

mtcars

You want to create a visualization. The first thing we need to do is set up the canvas…

The process

    mtcars |>
        ggplot()

The process

    mtcars |>
        ggplot(
        aes(
             x = variable.name, y = variable.name)
               )

aes: describe how variables in the data are mapped to your canvas

The process

+ “and”

When working with ggplot functions, we will add to our canvus using +

The process

    mtcars |>
        ggplot(
        aes(
             x = variable.name, y = variable.name)
               ) +
        geom_point()

The process

Scatter plot

Scatter plot

– Two quantitative variables

data |>
  ggplot(
    aes(x = , 
        y =)
  ) +
  geom_point()

Histogram

Histogram

– One quantitative variable

data |>
  ggplot(
    aes(x =)
  ) +
  geom_histogram()

Bar plot

Bar plot

– One categorical variable

data |>
  ggplot(
    aes(x = )
  ) +
  geom_bar() #or geom_col

Segmented bar plot

Segmented bar plot

– Two categorical variables

data |>
  ggplot(
    aes(x = ,
        fill =)
  ) +
  geom_bar()

Boxplot

Boxplot

– One quantitative; One categorical

data |>
  ggplot(
    aes(x = ,
        y =)
  ) +
  geom_boxplot()

aug-28 AE

In summary

– summarise is used to calculate statistics

– na.rm is a common argument used to override NA values during calculations

– ggplot() sets up our canvas

– aes maps variables from our data set to the canvas

– geom tells R what type of picture we want to paint

Recreate: For next time