English 中文(简体)
dplyr根据缺失的数值安排功能类型
原标题:dplyr arrange() function sort by missing values

我正试图通过Hadley Wickhams R进行数据科学工作,并经常就以下问题开展工作:,你可以安排()开始所有缺失的价值观? (Hint: use is.na()” 我使用的是 航班数据集,载于nyclews13。 鉴于这一安排将所有未知的数值归到数据底,我无法确定,在所有变量的缺失价值中,如何做相反的。 我认识到,这个问题可以用基本的R码来回答,但我特别关心如何利用dplyr和向安排发出呼吁,并且是.na()的职能。 感谢。

最佳回答

我们可以用<代码>desc加以总结,以便从一开始就获得缺失的数值。

flights %>% 
    arrange(desc(is.na(dep_time)),
           desc(is.na(dep_delay)),
           desc(is.na(arr_time)), 
           desc(is.na(arr_delay)),
           desc(is.na(tailnum)),
           desc(is.na(air_time)))

只有在根据这些变量得出的变量中才发现了NA值。

names(flights)[colSums(is.na(flights)) >0]
#[1] "dep_time"  "dep_delay" "arr_time"  "arr_delay" "tailnum"   "air_time" 

Instead of passing each variable name at a time, we can also use NSE arrange_

nm1 <- paste0("desc(is.na(", names(flights)[colSums(is.na(flights)) >0], "))")

r1 <- flights %>%
        arrange_(.dots = nm1) 

r1 %>%
   head()
#year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum
#  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>   <chr>  <int>   <chr>
#1  2013     1     2       NA           1545        NA       NA           1910        NA      AA    133    <NA>
#2  2013     1     2       NA           1601        NA       NA           1735        NA      UA    623    <NA>
#3  2013     1     3       NA            857        NA       NA           1209        NA      UA    714    <NA>
#4  2013     1     3       NA            645        NA       NA            952        NA      UA    719    <NA>
#5  2013     1     4       NA            845        NA       NA           1015        NA      9E   3405    <NA>
#6  2013     1     4       NA           1830        NA       NA           2044        NA      9E   3716    <NA>
#Variables not shown: origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
#  time_hour <time>.

Update

With the newer versions of tidyverse (dplyr_0.7.3, rlang_0.1.2) , we can also make use of arrange_at, arrange_all, arrange_if

nm1 <- names(flights)[colSums(is.na(flights)) >0]
r2 <- flights %>% 
          arrange_at(vars(nm1), funs(desc(is.na(.))))

或使用<条码>

f <- rlang::as_function(~ any(is.na(.)))
r3 <- flights %>% 
          arrange_if(f, funs(desc(is.na(.))))


identical(r1, r2)
#[1] TRUE

identical(r1, r3)
#[1] TRUE
问题回答

他刚才告诉你:

arrange(flights, desc(is.na(dep_time)))

其他冰 sh或t:

arrange(flights, !is.na(dep_time))

arrange(flights, -is.na(dep_time))

以下各行各行各行各通:<代码>。 NA:

flights %>% 
    arrange(desc(rowSums(is.na(.))))

    # A tibble: 336,776 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1   2013     1     2       NA           1545        NA       NA           1910
2   2013     1     2       NA           1601        NA       NA           1735
3   2013     1     3       NA            857        NA       NA           1209
4   2013     1     3       NA            645        NA       NA            952
5   2013     1     4       NA            845        NA       NA           1015
6   2013     1     4       NA           1830        NA       NA           2044
7   2013     1     5       NA            840        NA       NA           1001
8   2013     1     7       NA            820        NA       NA            958
9   2013     1     8       NA           1645        NA       NA           1838
10  2013     1     9       NA            755        NA       NA           1012
# ... with 336,766 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>,
#   flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
#   distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

@akrun的解决方案是罚款。 然而,arrange_ is deprecated SE texts of main verbs. 为避免这种情况,我们可以使用<代码>eval。

nmf <- names(flights)[colSums(is.na(flights)) > 0]
rules = paste0("!is.na(", nmf, ")")
rc <- paste(rules, collapse = ",")
arce <-  paste("arrange(flights," , rc , ")")
expr <- parse(text = arce)
ret <- eval(expr)

Using the data frame (x) the code should be

arrange(df, !is.na(x))





相关问题
How do I sort enum members alphabetically in Java?

I have an enum class like the following: public enum Letter { OMEGA_LETTER("Omega"), GAMMA_LETTER("Gamma"), BETA_LETTER("Beta"), ALPHA_LETTER("Alpha"), private final String ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Sorting twodimensional Array in AS3

So, i have a two-dimensional Array of ID s and vote count - voteArray[i][0] = ID, voteArray[i][1] = vote count I want the top 3 voted items to be displayed in different colors, so i have a 2nd Array -...

Linq operations against a List of Hashtables?

I m working with a set of legacy DAO code that returns an IList, where each Hashtable represents the row of a dynamically executed SQL query. For example, the List might contain the following records/...

C++ Array Sort Me

Stuck on an array sorter. Have to sort numbers from largest to smallest. I m trying two loops (one nested in the other). Here s the code: int counter=0; // inner counter int counter2=0; // outer ...

Can I Nest OrderBy in .NET?

This doesn t seem to work as I intend. VB.NET: Dim x = Model.Discussions.OrderByDescending(Function(d) d.Messages.OrderByDescending(Function(m) m.Sent).First.Sent) For Each d As Discussion In x ....

sorting elements javascript

I m looking for a way to sort my elements, but it isn t as easy as it sounds. Please let me explain My elements are grouped per 6 elements (thumbnails), each x represents a thumbnail However all ...

热门标签