Using dplyr - basics

R
Tidyverse
dplyr
Basics
Basics of the dplyr package
Author

Quantilogy

Published

January 20, 2024

dplyr - The workhose of data manipulation

If am I ever asked the question, which is your favorite R package, I often think of the package I load last in my script, and that package almost always is dplyr. dplyr is a part of the Tidyverse group of packages by Posit (formerly RStudio). In this post, I will go through the main functions of dplyr which greatly simplified data manipulation and analysis for me when I was getting started in R programming.

Loading packages

I like to load individual packages that I need rather than the entire tidyverse meta-package, but it is the user’s preference to either load dplyr or tidyverse. We will load only dplyr. I will also load the palmerpenguins package to use the palmer penguins dataset. You will have to install these packages, if you haven’t done that before (using the install.packages command).

palmerpenguins dataset provides information regarding the penguins seen foraging near the Palmer Status in Antarctica. Specifically (from the palmerpenguin package):

Size measurements, clutch observations, and blood isotope ratios for adult foraging Adélie, Chinstrap, and Gentoo penguins observed on islands in the Palmer Archipelago near Palmer Station, Antarctica. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station Long Term Ecological Research (LTER) Program.

library(dplyr)
library(palmerpenguins)

penguins <- palmerpenguins::penguins

Exploring the dataset

The first step should be to take a peek at it using the head function (of base R)

head(penguins)
# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# ℹ 2 more variables: sex <fct>, year <int>

The description of each column of the dataset is as follows:

  • Species - Species of the penguin observed
  • island - The island on which penguin was observed
  • bill_length_mm - Bill length of the penguin (in millimeters)
  • bill_depth_mm - Bill depth of the penguin (in millimeters)
  • flipper_length_mm - Flipper length of the penguin (in millimeters)

The question we want to answer are:

  • What is the average bill length of all the penguins?
  • What are bill lengths by species for the penguins?

This is a webR-enabled code cell in a Quarto HTML document.


Support my work with a coffee