dplyr包學(xué)習(xí)筆記

我的總結(jié)

  1. 按值篩選觀測(cè),filter()。
  2. 按名稱選取變量,select()。
  3. 對(duì)行進(jìn)行重新排序,arrange()。
  4. 使用現(xiàn)有變量的函數(shù)創(chuàng)建新變量,mutate()。
  5. 將多個(gè)值總結(jié)為一個(gè)摘要統(tǒng)計(jì)量,summarize()。
libarry(dplyr)

看下示例數(shù)據(jù)starwars

starwars

# A tibble: 87 × 14
   name          height  mass hair_color skin_color eye_color birth_year sex   gender homeworld species films vehicles
   <chr>          <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr>  <chr>     <chr>   <lis> <list>  
 1 Luke Skywalk…    172    77 blond      fair       blue            19   male  mascu… Tatooine  Human   <chr> <chr>   
 2 C-3PO            167    75 NA         gold       yellow         112   none  mascu… Tatooine  Droid   <chr> <chr>   
 3 R2-D2             96    32 NA         white, bl… red             33   none  mascu… Naboo     Droid   <chr> <chr>   
 4 Darth Vader      202   136 none       white      yellow          41.9 male  mascu… Tatooine  Human   <chr> <chr>   
 5 Leia Organa      150    49 brown      light      brown           19   fema… femin… Alderaan  Human   <chr> <chr>   
 6 Owen Lars        178   120 brown, gr… light      blue            52   male  mascu… Tatooine  Human   <chr> <chr>   
 7 Beru Whitesu…    165    75 brown      light      blue            47   fema… femin… Tatooine  Human   <chr> <chr>   
 8 R5-D4             97    32 NA         white, red red             NA   none  mascu… Tatooine  Droid   <chr> <chr>   
 9 Biggs Darkli…    183    84 black      light      brown           24   male  mascu… Tatooine  Human   <chr> <chr>   
10 Obi-Wan Keno…    182    77 auburn, w… fair       blue-gray       57   male  mascu… Stewjon   Human   <chr> <chr>   
# ? 77 more rows
# ? 1 more variable: starships <list>
# ? Use `print(n = ...)` to see more rows
  1. filter():篩選行
starwars %>% 
  filter(species == "Droid")

# A tibble: 6 × 14
  name   height  mass hair_color skin_color  eye_color birth_year sex   gender    homeworld species films     vehicles
  <chr>   <int> <dbl> <chr>      <chr>       <chr>          <dbl> <chr> <chr>     <chr>     <chr>   <list>    <list>  
1 C-3PO     167    75 NA         gold        yellow           112 none  masculine Tatooine  Droid   <chr [6]> <chr>   
2 R2-D2      96    32 NA         white, blue red               33 none  masculine Naboo     Droid   <chr [7]> <chr>   
3 R5-D4      97    32 NA         white, red  red               NA none  masculine Tatooine  Droid   <chr [1]> <chr>   
4 IG-88     200   140 none       metal       red               15 none  masculine NA        Droid   <chr [1]> <chr>   
5 R4-P17     96    NA none       silver, red red, blue         NA none  feminine  NA        Droid   <chr [2]> <chr>   
6 BB8        NA    NA none       none        black             NA none  masculine NA        Droid   <chr [1]> <chr>   
# ? 1 more variable: starships <list>
  1. select():篩選列
starwars %>% 
  select(name, ends_with("color"))

# A tibble: 87 × 4
   name               hair_color    skin_color  eye_color
   <chr>              <chr>         <chr>       <chr>    
 1 Luke Skywalker     blond         fair        blue     
 2 C-3PO              NA            gold        yellow   
 3 R2-D2              NA            white, blue red      
 4 Darth Vader        none          white       yellow   
 5 Leia Organa        brown         light       brown    
 6 Owen Lars          brown, grey   light       blue     
 7 Beru Whitesun Lars brown         light       blue     
 8 R5-D4              NA            white, red  red      
 9 Biggs Darklighter  black         light       brown    
10 Obi-Wan Kenobi     auburn, white fair        blue-gray
# ? 77 more rows
# ? Use `print(n = ...)` to see more rows
  1. mutate():在數(shù)據(jù)框里新增一列或改寫(xiě)一列,原地返回一個(gè)“改完以后”的數(shù)據(jù)框。
starwars %>% 
  mutate(name, bmi = mass / ((height / 100)  ^ 2)) %>%
  select(name:mass, bmi)

# A tibble: 87 × 4
   name               height  mass   bmi
   <chr>               <int> <dbl> <dbl>
 1 Luke Skywalker        172    77  26.0
 2 C-3PO                 167    75  26.9
 3 R2-D2                  96    32  34.7
 4 Darth Vader           202   136  33.3
 5 Leia Organa           150    49  21.8
 6 Owen Lars             178   120  37.9
 7 Beru Whitesun Lars    165    75  27.5
 8 R5-D4                  97    32  34.0
 9 Biggs Darklighter     183    84  25.1
10 Obi-Wan Kenobi        182    77  23.2
# ? 77 more rows
# ? Use `print(n = ...)` to see more rows
  1. arrange():按給定列(或表達(dá)式)把整行重新排序。默認(rèn)升序;降序用 desc(x)
starwars %>% 
  arrange(desc(mass))

# A tibble: 87 × 14
   name          height  mass hair_color skin_color eye_color birth_year sex   gender homeworld species films vehicles
   <chr>          <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr>  <chr>     <chr>   <lis> <list>  
 1 Jabba Desili…    175  1358 NA         green-tan… orange         600   herm… mascu… Nal Hutta Hutt    <chr> <chr>   
 2 Grievous         216   159 none       brown, wh… green, y…       NA   male  mascu… Kalee     Kaleesh <chr> <chr>   
 3 IG-88            200   140 none       metal      red             15   none  mascu… NA        Droid   <chr> <chr>   
 4 Darth Vader      202   136 none       white      yellow          41.9 male  mascu… Tatooine  Human   <chr> <chr>   
 5 Tarfful          234   136 brown      brown      blue            NA   male  mascu… Kashyyyk  Wookiee <chr> <chr>   
 6 Owen Lars        178   120 brown, gr… light      blue            52   male  mascu… Tatooine  Human   <chr> <chr>   
 7 Bossk            190   113 none       green      red             53   male  mascu… Trandosha Trando… <chr> <chr>   
 8 Chewbacca        228   112 brown      unknown    blue           200   male  mascu… Kashyyyk  Wookiee <chr> <chr>   
 9 Jek Tono Por…    180   110 brown      fair       blue            NA   NA    NA     Bestine … NA      <chr> <chr>   
10 Dexter Jetts…    198   102 none       brown      yellow          NA   male  mascu… Ojom      Besali… <chr> <chr>   
# ? 77 more rows
# ? 1 more variable: starships <list>
# ? Use `print(n = ...)` to see more rows
  1. group_by():把數(shù)據(jù)框“標(biāo)記”成若干分組,后續(xù)summarise() / mutate() / filter() 等會(huì)按組獨(dú)立計(jì)算,而不是整表一把梭。
  2. summarise():“把已經(jīng)分組(或未分組)的數(shù)據(jù)框,按組(或整表)壓縮成一行匯總結(jié)果?!?br> 每一組(或整表)只會(huì)留下一行。
    匯總指標(biāo)用聚合函數(shù)計(jì)算:n()、mean()、sum()、max()、first()…
    如果前面用了 group_by(),就每組一行;沒(méi)分組就整張表一行。
    匯總后自動(dòng)去掉分組
starwars %>%
  group_by(species) %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  ) %>%
  filter(
    n > 1,
    mass > 50
  )

# A tibble: 9 × 3
  species      n  mass
  <chr>    <int> <dbl>
1 Droid        6  69.8
2 Gungan       3  74  
3 Human       35  81.3
4 Kaminoan     2  88  
5 Mirialan     2  53.1
6 Twi'lek      2  55  
7 Wookiee      2 124  
8 Zabrak       2  80  
9 NA           4  81  

修改列名

  1. 只想“把 A 改成 B”——用rename()
  2. 想“批量改”——用rename_with()

構(gòu)建一個(gè)示例數(shù)據(jù)

df <- tibble(old1 = 1:3, old2 = letters[1:3], old3 = rnorm(3))

df
# A tibble: 3 × 3
   old1 old2    old3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

使用rename()

df2 <- df %>% rename(
  new1 = old1,          # 左邊是新名字,右邊是舊名字
  新列2  = old2         # 中文也行
)

df2
# A tibble: 3 × 3
   new1 新列2    old3
  <int> <chr>   <dbl>
1     1 a      0.0896
2     2 b     -0.921 
3     3 c     -0.122

使用rename_with()

全體加前綴

df %>% rename_with(~ paste0("pre_", .x))

# A tibble: 3 × 3
  pre_old1 pre_old2 pre_old3
     <int> <chr>       <dbl>
1        1 a          -0.448
2        2 b          -0.729
3        3 c          -0.976

全體加后綴

df %>% rename_with(~ paste0(.x, "_suf"))   

# A tibble: 3 × 3
  old1_suf old2_suf old3_suf
     <int> <chr>       <dbl>
1        1 a          -0.448
2        2 b          -0.729
3        3 c          -0.976

只改第 2、3 列

df %>% rename_with(toupper, .cols = 2:3)

# A tibble: 3 × 3
   old1 OLD2    OLD3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

只改以 old 開(kāi)頭的列

df
# A tibble: 3 × 3
   old1 old2    old3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

df %>% rename_with(~ sub("^old", "new", .x), starts_with("old"))

# A tibble: 3 × 3
   new1 new2    new3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

問(wèn)題1:我想批量修改指定列

df
# A tibble: 3 × 3
   old1 old2    old3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

df %>% rename_with(
  .fn = ~ c("x1", "x2"),
  .cols = 1:2
)
# A tibble: 3 × 3
     x1 x2      old3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

附錄

代碼:
https://github.com/wPencil/MyNotes/blob/8dfb444bfc0542f051f971fb62b4b8078911d36f/R/dplyr_learn_notes.R

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容