library(dplyr)
library(palmerpenguins)
<- palmerpenguins::penguins penguins
Using dplyr - basics
dplyr - The workhose of data manipulation
If am I ever asked the question, which is your favorite R
package, I often think of the package I load last in my script, and that package almost always is dplyr
. dplyr
is a part of the Tidyverse group of packages by Posit (formerly RStudio). In this post, I will go through the main functions of dplyr
which greatly simplified data manipulation and analysis for me when I was getting started in R
programming.
Loading packages
I like to load individual packages that I need rather than the entire tidyverse meta-package, but it is the user’s preference to either load dplyr
or tidyverse
. We will load only dplyr
. I will also load the palmerpenguins
package to use the palmer penguins dataset. You will have to install these packages, if you haven’t done that before (using the install.packages
command).
palmerpenguins
dataset provides information regarding the penguins seen foraging near the Palmer Status in Antarctica. Specifically (from the palmerpenguin
package):
Size measurements, clutch observations, and blood isotope ratios for adult foraging Adélie, Chinstrap, and Gentoo penguins observed on islands in the Palmer Archipelago near Palmer Station, Antarctica. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station Long Term Ecological Research (LTER) Program.
Exploring the dataset
The first step should be to take a peek at it using the head
function (of base R
)
head(penguins)
# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
# ℹ 2 more variables: sex <fct>, year <int>
The description of each column of the dataset is as follows:
- Species - Species of the penguin observed
- island - The island on which penguin was observed
- bill_length_mm - Bill length of the penguin (in millimeters)
- bill_depth_mm - Bill depth of the penguin (in millimeters)
- flipper_length_mm - Flipper length of the penguin (in millimeters)
The question we want to answer are:
- What is the average bill length of all the penguins?
- What are bill lengths by species for the penguins?
This is a webR-enabled code cell in a Quarto HTML document.