class: center, middle, inverse, title-slide # R in Grenoble tutorials ## data manipulation with
{dplyr}
### M. Rolland ### 2019-05-17 --- class: inverse, middle # the tidyverse --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/tidyverse.PNG") background-size: 950px background-position: 0% 0% # the tidyverse <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> .pull-left[ www.tidyverse.com ] -- .pull-right[ ```r library(tidyverse) ``` ] --- class: inverse, middle # **tidy data** = one line per observation and one column per variable --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Tidy data -- Not Tidy ``` ## ident t1 t2 t3 ## 1 1 10.4 4.8 15.7 ## 2 2 14.1 6.3 7.7 ## 3 3 12.2 10.2 12.7 ## 4 4 6.1 8.9 8.5 ## 5 5 8.3 9.2 10.9 ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Tidy data .pull-left[ Not Tidy ``` ## ident t1 t2 t3 ## 1 1 10.4 4.8 15.7 ## 2 2 14.1 6.3 7.7 ## 3 3 12.2 10.2 12.7 ## 4 4 6.1 8.9 8.5 ## 5 5 8.3 9.2 10.9 ``` ] .pull-right[ Tidy ``` ## ident time measure ## 1 1 t1 10.4 ## 2 2 t1 14.1 ## 3 3 t1 12.2 ## 4 4 t1 6.1 ## 5 5 t1 8.3 ## 6 1 t2 4.8 ## 7 2 t2 6.3 ## 8 3 t2 10.2 ## 9 4 t2 8.9 ## 10 5 t2 9.2 ## 11 1 t3 15.7 ## 12 2 t3 7.7 ## 13 3 t3 12.7 ## 14 4 t3 8.5 ## 15 5 t3 10.9 ``` ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Tidy data .pull-left[ Not Tidy ``` ## ident t1 t2 t3 ## 1 1 10.4 4.8 15.7 ## 2 2 14.1 6.3 7.7 ## 3 3 12.2 10.2 12.7 ## 4 4 6.1 8.9 8.5 ## 5 5 8.3 9.2 10.9 ``` ] .pull-right[ Tidy ``` ## ident time measure *## 1 1 t1 10.4 *## 2 2 t1 14.1 *## 3 3 t1 12.2 *## 4 4 t1 6.1 *## 5 5 t1 8.3 ## 6 1 t2 4.8 ## 7 2 t2 6.3 ## 8 3 t2 10.2 ## 9 4 t2 8.9 ## 10 5 t2 9.2 ## 11 1 t3 15.7 ## 12 2 t3 7.7 ## 13 3 t3 12.7 ## 14 4 t3 8.5 ## 15 5 t3 10.9 ``` ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Tidy data .pull-left[ Not Tidy ``` ## ident t1 t2 t3 ## 1 1 10.4 4.8 15.7 ## 2 2 14.1 6.3 7.7 ## 3 3 12.2 10.2 12.7 ## 4 4 6.1 8.9 8.5 ## 5 5 8.3 9.2 10.9 ``` ] .pull-right[ Tidy ``` ## ident time measure ## 1 1 t1 10.4 ## 2 2 t1 14.1 ## 3 3 t1 12.2 ## 4 4 t1 6.1 ## 5 5 t1 8.3 *## 6 1 t2 4.8 *## 7 2 t2 6.3 *## 8 3 t2 10.2 *## 9 4 t2 8.9 *## 10 5 t2 9.2 ## 11 1 t3 15.7 ## 12 2 t3 7.7 ## 13 3 t3 12.7 ## 14 4 t3 8.5 ## 15 5 t3 10.9 ``` ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Tidy data .pull-left[ Not Tidy ``` ## ident t1 t2 t3 ## 1 1 10.4 4.8 15.7 ## 2 2 14.1 6.3 7.7 ## 3 3 12.2 10.2 12.7 ## 4 4 6.1 8.9 8.5 ## 5 5 8.3 9.2 10.9 ``` ] .pull-right[ Tidy ``` ## ident time measure ## 1 1 t1 10.4 ## 2 2 t1 14.1 ## 3 3 t1 12.2 ## 4 4 t1 6.1 ## 5 5 t1 8.3 ## 6 1 t2 4.8 ## 7 2 t2 6.3 ## 8 3 t2 10.2 ## 9 4 t2 8.9 ## 10 5 t2 9.2 *## 11 1 t3 15.7 *## 12 2 t3 7.7 *## 13 3 t3 12.7 *## 14 4 t3 8.5 *## 15 5 t3 10.9 *``` ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Tidy data -- ```r tidy <- tidyr::gather(not_tidy, "t1", "t2", "t3", key = "time", value = "measure") ``` --- class: inverse, middle # tibbles --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # tibbles -- * Modern reimagining of the data.frame -- * Tibbles *are* data.frames but modify some older behaviours to make life a little easier -- * Preferred data format in the tidyverse -- * No need to worry about this! --- class: clear background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% ```r # load starwars data data(starwars) # view data starwars ``` ``` ## # A tibble: 87 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Luke~ 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 <NA> gold yellow 112 <NA> ## 3 R2-D2 96 32 <NA> white, bl~ red 33 <NA> ## 4 Dart~ 202 136 none white yellow 41.9 male ## 5 Leia~ 150 49 brown light brown 19 female ## 6 Owen~ 178 120 brown, gr~ light blue 52 male ## 7 Beru~ 165 75 brown light blue 47 female ## 8 R5-D4 97 32 <NA> white, red red NA <NA> ## 9 Bigg~ 183 84 black light brown 24 male ## 10 Obi-~ 182 77 auburn, w~ fair blue-gray 57 male ## # ... with 77 more rows, and 5 more variables: homeworld <chr>, ## # species <chr>, films <list>, vehicles <list>, starships <list> ``` --- class: inverse, middle # {dplyr} --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # {dplyr} -- * Data manipulation package of the `tidyverse` -- * Uses a short list of verbs: `filter()`, `arrange()`, `select()`, `mutate()`, `summarise()` instead of symbols `$`, `[` or `]` for clearer code -- * These simple verbs can be combined to do complex operations using the `%>%` (pipe) operator -- * The `group_by()` operator allows to perform operations on groups -- * Combine tables with `left_join()`, `right_join()`, `inner_join()`, `full_join()`, `bind_rows()` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # dplyr functions All dplyr functions work similarly: -- 1. input data frame -- 2. what to do with the data frame -- 3. output data frame -- ```r data_out <- dplyr_function(data_in, action) ``` --- class: inverse .left-column2[ ## **filter()** ## arrange() ## select() ## mutate() ## summarise() and group_by() ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # filter() -- * Extraction of observations based on one or several conditions (= subset of lines) -- * Conditional operators accepted in filter: `==`, `<`, `>`, `<=`, `>=`, `is.na()`, `!is.na()`, `%in%`, `!`, `|`, `&`, `xor()` -- * Do not mistake `=` and `==` ! --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # filter() * Single condition -- ```r new_data <- filter(starwars, mass > 100) ``` -- ``` ## # A tibble: 10 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Dart~ 202 136 none white yellow 41.9 male ## 2 Owen~ 178 120 brown, gr~ light blue 52 male ## 3 Chew~ 228 112 brown unknown blue 200 male ## 4 Jabb~ 175 1358 <NA> green-tan~ orange 600 herma~ ## 5 Jek ~ 180 110 brown fair blue NA male ## 6 IG-88 200 140 none metal red 15 none ## 7 Bossk 190 113 none green red 53 male ## 8 Dext~ 198 102 none brown yellow NA male ## 9 Grie~ 216 159 none brown, wh~ green, y~ NA male ## 10 Tarf~ 234 136 brown brown blue NA male ## # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>, ## # vehicles <list>, starships <list> ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # filter() * Multiple conditions -- ```r new_data <- filter(starwars, mass > 100 & gender == "female") ``` -- ``` ## # A tibble: 0 x 13 ## # ... with 13 variables: name <chr>, height <int>, mass <dbl>, ## # hair_color <chr>, skin_color <chr>, eye_color <chr>, birth_year <dbl>, ## # gender <chr>, homeworld <chr>, species <chr>, films <list>, ## # vehicles <list>, starships <list> ``` -- ```r new_data <- filter(starwars, mass > 100 & gender %in% c("hermaphrodite", "none")) ``` -- ``` ## # A tibble: 2 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Jabb~ 175 1358 <NA> green-tan~ orange 600 herma~ ## 2 IG-88 200 140 none metal red 15 none ## # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>, ## # vehicles <list>, starships <list> ``` --- class: inverse .left-column2[ ## filter() ## **arrange()** ## select() ## mutate() ## summarise() and group_by() ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # arrange() -- * Change order of observations -- * In increasing order -- ```r new_data <- arrange(starwars, height) ``` -- ``` ## # A tibble: 87 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Yoda 66 17 white green brown 896 male ## 2 Ratt~ 79 15 none grey, blue unknown NA male ## 3 Wick~ 88 20 brown brown brown 8 male ## 4 Dud ~ 94 45 none blue, grey yellow NA male ## 5 R2-D2 96 32 <NA> white, bl~ red 33 <NA> ## 6 R4-P~ 96 NA none silver, r~ red, blue NA female ## 7 R5-D4 97 32 <NA> white, red red NA <NA> ## 8 Sebu~ 112 40 none grey, red orange NA male ## 9 Gasg~ 122 NA none white, bl~ black NA male ## 10 Watto 137 NA black blue, grey yellow NA male ## # ... with 77 more rows, and 5 more variables: homeworld <chr>, ## # species <chr>, films <list>, vehicles <list>, starships <list> ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # arrange() * Use `desc(var)` to arrange in decreasing order -- ```r new_data <- arrange(starwars, desc(height)) ``` -- ``` ## # A tibble: 87 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Yara~ 264 NA none white yellow NA male ## 2 Tarf~ 234 136 brown brown blue NA male ## 3 Lama~ 229 88 none grey black NA male ## 4 Chew~ 228 112 brown unknown blue 200 male ## 5 Roos~ 224 82 none grey orange NA male ## 6 Grie~ 216 159 none brown, wh~ green, y~ NA male ## 7 Taun~ 213 NA none grey black NA female ## 8 Rugo~ 206 NA none green orange NA male ## 9 Tion~ 206 80 none grey black NA male ## 10 Dart~ 202 136 none white yellow 41.9 male ## # ... with 77 more rows, and 5 more variables: homeworld <chr>, ## # species <chr>, films <list>, vehicles <list>, starships <list> ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # arrange() * Possible to use multiple variables (categorical) -- ```r new_data <- arrange(starwars, hair_color, height) ``` -- ``` ## # A tibble: 87 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Mon ~ 150 NA auburn fair blue 48 female ## 2 Wilh~ 180 NA auburn, g~ fair blue 64 male ## 3 Obi-~ 182 77 auburn, w~ fair blue-gray 57 male ## 4 Watto 137 NA black blue, grey yellow NA male ## 5 Shmi~ 163 NA black fair brown 72 female ## 6 Barr~ 166 50 black yellow blue 40 female ## 7 Lumi~ 170 56.2 black yellow blue 58 female ## 8 Eeth~ 171 NA black brown brown NA male ## 9 Land~ 177 79 black dark brown 31 male ## 10 Bigg~ 183 84 black light brown 24 male ## # ... with 77 more rows, and 5 more variables: homeworld <chr>, ## # species <chr>, films <list>, vehicles <list>, starships <list> ``` --- class: inverse .left-column2[ ## filter() ## arrange() ## **select()** ## mutate() ## summarise() and group_by() ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # select() -- * select specific variables of your data (ie subset of columns) -- * variables can be specified explicitly -- ```r new_data <- select(starwars, name, height, mass) ``` -- ``` ## # A tibble: 87 x 3 ## name height mass ## <chr> <int> <dbl> ## 1 Luke Skywalker 172 77 ## 2 C-3PO 167 75 ## 3 R2-D2 96 32 ## 4 Darth Vader 202 136 ## 5 Leia Organa 150 49 ## 6 Owen Lars 178 120 ## 7 Beru Whitesun lars 165 75 ## 8 R5-D4 97 32 ## 9 Biggs Darklighter 183 84 ## 10 Obi-Wan Kenobi 182 77 ## # ... with 77 more rows ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # select() * as a range of variables -- ```r new_data <- select(starwars, name : mass) ``` -- ``` ## # A tibble: 87 x 3 ## name height mass ## <chr> <int> <dbl> ## 1 Luke Skywalker 172 77 ## 2 C-3PO 167 75 ## 3 R2-D2 96 32 ## 4 Darth Vader 202 136 ## 5 Leia Organa 150 49 ## 6 Owen Lars 178 120 ## 7 Beru Whitesun lars 165 75 ## 8 R5-D4 97 32 ## 9 Biggs Darklighter 183 84 ## 10 Obi-Wan Kenobi 182 77 ## # ... with 77 more rows ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # select() * as indexes -- ```r new_data <- select(starwars, 1, 3, 10:12) ``` -- ``` ## # A tibble: 87 x 5 ## name mass species films vehicles ## <chr> <dbl> <chr> <list> <list> ## 1 Luke Skywalker 77 Human <chr [5]> <chr [2]> ## 2 C-3PO 75 Droid <chr [6]> <chr [0]> ## 3 R2-D2 32 Droid <chr [7]> <chr [0]> ## 4 Darth Vader 136 Human <chr [4]> <chr [0]> ## 5 Leia Organa 49 Human <chr [5]> <chr [1]> ## 6 Owen Lars 120 Human <chr [3]> <chr [0]> ## 7 Beru Whitesun lars 75 Human <chr [3]> <chr [0]> ## 8 R5-D4 32 Droid <chr [1]> <chr [0]> ## 9 Biggs Darklighter 84 Human <chr [1]> <chr [0]> ## 10 Obi-Wan Kenobi 77 Human <chr [6]> <chr [1]> ## # ... with 77 more rows ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # select() * you can drop columns with - -- ```r new_data <- select(starwars, -name) ``` -- ``` ## # A tibble: 87 x 12 ## height mass hair_color skin_color eye_color birth_year gender homeworld ## <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> ## 1 172 77 blond fair blue 19 male Tatooine ## 2 167 75 <NA> gold yellow 112 <NA> Tatooine ## 3 96 32 <NA> white, bl~ red 33 <NA> Naboo ## 4 202 136 none white yellow 41.9 male Tatooine ## 5 150 49 brown light brown 19 female Alderaan ## 6 178 120 brown, gr~ light blue 52 male Tatooine ## 7 165 75 brown light blue 47 female Tatooine ## 8 97 32 <NA> white, red red NA <NA> Tatooine ## 9 183 84 black light brown 24 male Tatooine ## 10 182 77 auburn, w~ fair blue-gray 57 male Stewjon ## # ... with 77 more rows, and 4 more variables: species <chr>, ## # films <list>, vehicles <list>, starships <list> ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # select() * there exists many helper functions such as: `starts_with()`, `ends_with()`, `contains()`, etc. -- ```r new_data <- select(starwars, contains("color")) ``` -- ``` ## # A tibble: 87 x 3 ## hair_color skin_color eye_color ## <chr> <chr> <chr> ## 1 blond fair blue ## 2 <NA> gold yellow ## 3 <NA> white, blue red ## 4 none white yellow ## 5 brown light brown ## 6 brown, grey light blue ## 7 brown light blue ## 8 <NA> white, red red ## 9 black light brown ## 10 auburn, white fair blue-gray ## # ... with 77 more rows ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # select() * all these can be combined -- ```r new_data <- select(starwars, name, contains("color"), 12) ``` -- ``` ## # A tibble: 87 x 5 ## name hair_color skin_color eye_color vehicles ## <chr> <chr> <chr> <chr> <list> ## 1 Luke Skywalker blond fair blue <chr [2]> ## 2 C-3PO <NA> gold yellow <chr [0]> ## 3 R2-D2 <NA> white, blue red <chr [0]> ## 4 Darth Vader none white yellow <chr [0]> ## 5 Leia Organa brown light brown <chr [1]> ## 6 Owen Lars brown, grey light blue <chr [0]> ## 7 Beru Whitesun lars brown light blue <chr [0]> ## 8 R5-D4 <NA> white, red red <chr [0]> ## 9 Biggs Darklighter black light brown <chr [0]> ## 10 Obi-Wan Kenobi auburn, white fair blue-gray <chr [1]> ## # ... with 77 more rows ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # select() * carefull with conflicts! `select`, `filter`... -- * `dplyr::select()` * `select <- dplyr::select` --- class: inverse .left-column2[ ## filter() ## arrange() ## select() ## **mutate()** ## summarise() and group_by() ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # mutate() * add new column (variable) that is a function of other functions -- ```r new_data <- mutate(starwars, BMI = mass / height^2) ``` -- ``` ## # A tibble: 87 x 14 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Luke~ 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 <NA> gold yellow 112 <NA> ## 3 R2-D2 96 32 <NA> white, bl~ red 33 <NA> ## 4 Dart~ 202 136 none white yellow 41.9 male ## 5 Leia~ 150 49 brown light brown 19 female ## 6 Owen~ 178 120 brown, gr~ light blue 52 male ## 7 Beru~ 165 75 brown light blue 47 female ## 8 R5-D4 97 32 <NA> white, red red NA <NA> ## 9 Bigg~ 183 84 black light brown 24 male ## 10 Obi-~ 182 77 auburn, w~ fair blue-gray 57 male ## # ... with 77 more rows, and 6 more variables: homeworld <chr>, ## # species <chr>, films <list>, vehicles <list>, starships <list>, ## # BMI <dbl> ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # mutate() ```r # view BMI new_data <- select(new_data, BMI, everything()) ``` -- ``` ## # A tibble: 87 x 14 ## BMI name height mass hair_color skin_color eye_color birth_year ## <dbl> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> ## 1 0.00260 Luke~ 172 77 blond fair blue 19 ## 2 0.00269 C-3PO 167 75 <NA> gold yellow 112 ## 3 0.00347 R2-D2 96 32 <NA> white, bl~ red 33 ## 4 0.00333 Dart~ 202 136 none white yellow 41.9 ## 5 0.00218 Leia~ 150 49 brown light brown 19 ## 6 0.00379 Owen~ 178 120 brown, gr~ light blue 52 ## 7 0.00275 Beru~ 165 75 brown light blue 47 ## 8 0.00340 R5-D4 97 32 <NA> white, red red NA ## 9 0.00251 Bigg~ 183 84 black light brown 24 ## 10 0.00232 Obi-~ 182 77 auburn, w~ fair blue-gray 57 ## # ... with 77 more rows, and 6 more variables: gender <chr>, ## # homeworld <chr>, species <chr>, films <list>, vehicles <list>, ## # starships <list> ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # mutate() * you can reference to newly created variables in same command * retain newly created variables only using `transmute()` instead of `mutate()` -- ```r new_data <- transmute(starwars, height_m = height/100, BMI = mass / height_m^2) ``` -- ``` ## # A tibble: 87 x 2 ## height_m BMI ## <dbl> <dbl> ## 1 1.72 26.0 ## 2 1.67 26.9 ## 3 0.96 34.7 ## 4 2.02 33.3 ## 5 1.5 21.8 ## 6 1.78 37.9 ## 7 1.65 27.5 ## 8 0.97 34.0 ## 9 1.83 25.1 ## 10 1.82 23.2 ## # ... with 77 more rows ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # mutate() functions * arithmetic operators: `+`, `-`, `*`, `/`, `^` * modular arithmetic: `%/%`, `%%` * logs: `log()`, `log2()`, `log10()` * offsets: `lead()`, `lag()` * cumulative and rolling aggregates: `cumsum()`, `cumprod()`, `cummin()`, `cummax()`, `cummean()` * logical comparisons: `<`, `<=`, `>`, `>=`, `!=` * ranking: `min_rank()`, `row_number()`, `dense_rank()`, `percent_rank()`, `cume_dist()`, `ntile()` -- * Any R or custom function that returns a **vector** with the same length as the number of rows --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # mutate() functions -- ```r BMI_function <- function(height, mass){ bmi <- mass / height^2 return(bmi) } ``` -- ```r new_data <- transmute(starwars, height_m = height/100, BMI = BMI_function(height_m, mass)) ``` -- ``` ## # A tibble: 87 x 2 ## height_m BMI ## <dbl> <dbl> ## 1 1.72 26.0 ## 2 1.67 26.9 ## 3 0.96 34.7 ## 4 2.02 33.3 ## 5 1.5 21.8 ## 6 1.78 37.9 ## 7 1.65 27.5 ## 8 0.97 34.0 ## 9 1.83 25.1 ## 10 1.82 23.2 ## # ... with 77 more rows ``` --- class: inverse .left-column2[ ## filter() ## arrange() ## select() ## mutate() ## **summarise()** and **group_by()** ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # summarise() & group_by() -- * `summarise()` collapses data to a single row -- ```r new_summary <- summarise(starwars, mean_height = mean(height, na.rm = TRUE), sd_height = sd(height, na.rm = TRUE)) ``` -- ``` ## # A tibble: 1 x 2 ## mean_height sd_height ## <dbl> <dbl> ## 1 174. 34.8 ``` -- * location: `mean(x)`, `median(x)` * spread: `sd(x)`, `IQR(x)`, `mad(x)` * rank: `min(x)`, `quantile(x, 0.25)`, `max(x)` * position: `first(x)`, `nth(x, 2)`, `last(x)` * count: `n(x)`, `sum(!is.na(x))`, `n_distinct(x)` * any base R or custom function that returns **one summary value** --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # summarise() & group_by() * useful when combined with `group_by()` to apply summaries by group -- ```r starwars <- group_by(starwars, species) ``` -- ``` ## # A tibble: 87 x 13 ## # Groups: species [38] ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Luke~ 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 <NA> gold yellow 112 <NA> ## 3 R2-D2 96 32 <NA> white, bl~ red 33 <NA> ## 4 Dart~ 202 136 none white yellow 41.9 male ## 5 Leia~ 150 49 brown light brown 19 female ## 6 Owen~ 178 120 brown, gr~ light blue 52 male ## 7 Beru~ 165 75 brown light blue 47 female ## 8 R5-D4 97 32 <NA> white, red red NA <NA> ## 9 Bigg~ 183 84 black light brown 24 male ## 10 Obi-~ 182 77 auburn, w~ fair blue-gray 57 male ## # ... with 77 more rows, and 5 more variables: homeworld <chr>, ## # species <chr>, films <list>, vehicles <list>, starships <list> ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # summarise() & group_by() * useful when combined with `group_by()` to apply summaries by group ```r starwars <- group_by(starwars, species) ``` ``` ## # A tibble: 87 x 13 *## # Groups: species [38] ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Luke~ 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 <NA> gold yellow 112 <NA> ## 3 R2-D2 96 32 <NA> white, bl~ red 33 <NA> ## 4 Dart~ 202 136 none white yellow 41.9 male ## 5 Leia~ 150 49 brown light brown 19 female ## 6 Owen~ 178 120 brown, gr~ light blue 52 male ## 7 Beru~ 165 75 brown light blue 47 female ## 8 R5-D4 97 32 <NA> white, red red NA <NA> ## 9 Bigg~ 183 84 black light brown 24 male ## 10 Obi-~ 182 77 auburn, w~ fair blue-gray 57 male ## # ... with 77 more rows, and 5 more variables: homeworld <chr>, ## # species <chr>, films <list>, vehicles <list>, starships <list> ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # summarise() & group_by() ```r new_summary <- summarise(starwars, mean_height = mean(height, na.rm = TRUE), sd_height = sd(height, na.rm = TRUE)) ``` -- ``` ## # A tibble: 38 x 3 ## species mean_height sd_height ## <chr> <dbl> <dbl> ## 1 <NA> 160 42.7 ## 2 Aleena 79 NaN ## 3 Besalisk 198 NaN ## 4 Cerean 198 NaN ## 5 Chagrian 196 NaN ## 6 Clawdite 168 NaN ## 7 Droid 140 52.0 ## 8 Dug 112 NaN ## 9 Ewok 88 NaN ## 10 Geonosian 183 NaN ## # ... with 28 more rows ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # summarise() & group_by() -- ```r mean_BMI <- function(mass, height){ # compute height in m height_m <- height / 100 # compute BMI BMI <- mass / height_m^2 # return mean BMI return(mean(BMI, na.rm = TRUE)) } ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # summarise() & group_by() ```r summarise(starwars, n = n(), miss_height = sum(is.na(height)), miss_mass = sum(is.na(mass)), mean_BMI = mean_BMI(mass, height)) ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # summarise() & group_by() ```r summarise(starwars, n = n(), miss_height = sum(is.na(height)), miss_mass = sum(is.na(mass)), mean_BMI = mean_BMI(mass, height)) ``` ``` ## # A tibble: 38 x 5 ## species n miss_height miss_mass mean_BMI ## <chr> <int> <int> <int> <dbl> ## 1 <NA> 5 1 4 15.1 ## 2 Aleena 1 0 0 24.0 ## 3 Besalisk 1 0 0 26.0 ## 4 Cerean 1 0 0 20.9 ## 5 Chagrian 1 0 1 NaN ## 6 Clawdite 1 0 0 19.5 ## 7 Droid 5 1 1 32.7 ## 8 Dug 1 0 0 31.9 ## 9 Ewok 1 0 0 25.8 ## 10 Geonosian 1 0 0 23.9 ## # ... with 28 more rows ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # summarise() & group_by() ```r summarise(starwars, n = n(), miss_height = sum(is.na(height)), miss_mass = sum(is.na(mass)), mean_BMI = mean_BMI(mass, height)) ``` ``` ## # A tibble: 38 x 5 ## species n miss_height miss_mass mean_BMI ## <chr> <int> <int> <int> <dbl> ## 1 <NA> 5 1 4 15.1 ## 2 Aleena 1 0 0 24.0 ## 3 Besalisk 1 0 0 26.0 ## 4 Cerean 1 0 0 20.9 ## 5 Chagrian 1 0 1 NaN ## 6 Clawdite 1 0 0 19.5 *## 7 Droid 5 1 1 32.7 ## 8 Dug 1 0 0 31.9 ## 9 Ewok 1 0 0 25.8 ## 10 Geonosian 1 0 0 23.9 ## # ... with 28 more rows ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # ungroup() * ungroup data with `ungroup()` -- ```r starwars <- ungroup(starwars) ``` --- class: inverse, middle # the pipe operator %>% --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # the pipe operator %>% --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-magrittr.png") background-size: 150px background-position: 90% 8% # the pipe operator %>% * tool from the **{magrittr}** package -- * fully integrated in the tidyverse -- * to express a sequence of operations -- * pronounces "then" when reading the code -- <br> <br> <br> .right[ .small[ Details + history of the pipe: http://adolfoalvarez.cl/plumbers-chains-and-famous-painters-the-history-of-the-pipe-operator-in-r/ ] ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # the pipe operator %>% * Sequence of **{dplyr}** operations -- ```r humans <- filter(starwars, species == "Human") BMI_variables <- select(humans, name, gender, height, mass) BMI_data <- mutate(BMI_variables, height_m = height / 100, BMI = mass / height_m^2) BMI_data_sorted <- arrange(BMI_data, desc(BMI)) ``` -- * Becomes -- ```r BMI_data <- starwars %>% filter(species == "Human") %>% select(name, gender, height, mass) %>% mutate(height_m = height / 100, BMI = mass / height_m^2) %>% arrange(BMI) ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # the pipe operator %>% -- * Any R function can be included in the pipe flow -- * Use `.` in the data input argument to apply any non tidyverse R function that returns a **data frame** -- ```r species_data <- starwars %>% group_by(species) %>% summarise(mean_BMI = mean_BMI(mass, height)) %>% t(.) %>% as.data.frame(.) ``` -- ``` ## V1 V2 V3 V4 V5 V6 ## species <NA> Aleena Besalisk Cerean Chagrian Clawdite ## mean_BMI 15.14960 24.03461 26.01775 20.91623 <NA> 19.48696 ## V7 V8 V9 V10 V11 V12 ## species Droid Dug Ewok Geonosian Gungan Human ## mean_BMI 32.65613 31.88776 25.82645 23.88844 16.76141 25.48618 ## V13 V14 V15 V16 V17 V18 ## species Hutt Iktotchi Kaleesh Kaminoan Kel Dor Mirialan ## mean_BMI 443.42857 <NA> 34.07922 16.78076 22.63468 18.79562 ## V19 V20 V21 V22 V23 V24 ## species Mon Calamari Muun Nautolan Neimodian Pau'an Quermian ## mean_BMI 25.61728 <NA> 22.64681 24.67038 18.85192 <NA> ## V25 V26 V27 V28 V29 V30 ## species Rodian Skakoan Sullustan Tholothian Togruta Toong ## mean_BMI 24.72518 12.88625 26.56250 14.76843 17.99015 24.46460 ## V31 V32 V33 V34 V35 V36 ## species Toydarian Trandoshan Twi'lek Vulptereen Wookiee Xexto ## mean_BMI <NA> 31.30194 17.35892 50.92802 23.19128 <NA> ## V37 V38 ## species Yoda's species Zabrak ## mean_BMI 39.02663 26.12245 ``` --- class: inverse, middle # Do **not** use the pipe when ## * more than 10 steps ## * multiple inputs/outputs ## * complex structure (pipes are linear) --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # base R VS {dplyr} -- * **{dplyr}** ```r BMI_data <- starwars %>% filter(species == "Human") %>% select(name, gender, height, mass) %>% mutate(height_m = height / 100, BMI = mass / height_m^2) %>% arrange(BMI) ``` -- * Base R ```r humans <- starwars[starwars$species == "Human" & !is.na(starwars$species), ] # filter BMI_data <- humans[, c("name", "gender", "height", "mass")] # select BMI_data$height_m <- BMI_data$height / 100 # mutate BMI_data$BMI <- BMI_data$mass / BMI_data$height_m^2 # mutate BMI_data <- BMI_data[order(desc(BMI_data$BMI)), ] # arrange ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Exercises .left-column[ ## Ex 1 ] .right-column[ * Find which character has green skin and measures less than 1m ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Solution * Find which character has green skin and measures less than 1m ```r starwars %>% filter(skin_color == "green" & height < 100) %>% select(name) ``` ``` ## # A tibble: 1 x 1 ## name ## <chr> ## 1 Yoda ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Exercises .left-column[ ## Ex 1 ## Ex 2 ] .right-column[ * Find from which planet the most starwars characters come from * List all characters from that planet ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Solution * Find from which planet the most starwars characters come from ```r starwars %>% filter(!is.na(homeworld)) %>% group_by(homeworld) %>% summarise(n = n()) %>% arrange(desc(n)) ``` ``` ## # A tibble: 48 x 2 ## homeworld n ## <chr> <int> ## 1 Naboo 11 ## 2 Tatooine 10 ## 3 Alderaan 3 ## 4 Coruscant 3 ## 5 Kamino 3 ## 6 Corellia 2 ## 7 Kashyyyk 2 ## 8 Mirial 2 ## 9 Ryloth 2 ## 10 Aleen Minor 1 ## # ... with 38 more rows ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Solution * List all characters from that planet ```r starwars %>% filter(homeworld == "Naboo") %>% select(name) ``` ``` ## # A tibble: 11 x 1 ## name ## <chr> ## 1 R2-D2 ## 2 Palpatine ## 3 Jar Jar Binks ## 4 Roos Tarpals ## 5 Rugor Nass ## 6 Ric Olié ## 7 Quarsh Panaka ## 8 Gregar Typho ## 9 Cordé ## 10 Dormé ## 11 Padmé Amidala ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Exercises .left-column[ ## Ex 1 ## Ex 2 ## Ex 3 ] .right-column[ * Which species has the heaviest mean weight? * Which species have a mean weight between 80kg and 100kg? * In the mean weight classification, which rank are Ewoks? (heaviest species: rank = 1) ] --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Solution * Which species has the heaviest mean weight? ```r starwars %>% group_by(species) %>% summarise(mean_weight = mean(mass, na.rm = TRUE)) %>% arrange(desc(mean_weight)) %>% mutate(rank = dense_rank(desc(mean_weight))) ``` ``` ## # A tibble: 38 x 3 ## species mean_weight rank ## <chr> <dbl> <int> ## 1 Hutt 1358 1 ## 2 Kaleesh 159 2 ## 3 Wookiee 124 3 ## 4 Trandoshan 113 4 ## 5 Besalisk 102 5 ## 6 Neimodian 90 6 ## 7 Kaminoan 88 7 ## 8 Nautolan 87 8 ## 9 Mon Calamari 83 9 ## 10 Human 82.8 10 ## # ... with 28 more rows ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Solution * Which species have a mean weight between 80kg and 100kg? ```r starwars %>% group_by(species) %>% summarise(mean_weight = mean(mass, na.rm = TRUE)) %>% filter(between(mean_weight, 80, 100)) ``` ``` ## # A tibble: 10 x 2 ## species mean_weight ## <chr> <dbl> ## 1 Cerean 82 ## 2 Geonosian 80 ## 3 Human 82.8 ## 4 Kaminoan 88 ## 5 Kel Dor 80 ## 6 Mon Calamari 83 ## 7 Nautolan 87 ## 8 Neimodian 90 ## 9 Pau'an 80 ## 10 Zabrak 80 ``` --- background-image: url("https://raw.githubusercontent.com/r-in-grenoble/r-in-grenoble.github.io/master/images/hex-dplyr_transparent.png") background-size: 150px background-position: 90% 8% # Solution * In this classification, which rank are Ewoks? (heaviest species: rank = 1) ```r starwars %>% group_by(species) %>% summarise(mean_weight = mean(mass, na.rm = TRUE)) %>% arrange(desc(mean_weight)) %>% mutate(rank = dense_rank(desc(mean_weight))) %>% filter(rank == 1 | species == "Ewok") ``` --- class: center, middle # **#Thanks!** Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). Don't forget to visit our website: https://r-in-grenoble.github.io/ And subscribe to the R in grenoble mailing list: https://listes.univ-grenoble-alpes.fr/sympa/subscribe/r-in-grenoble?previous_action=info