English 中文(简体)
创建按列范围加权的相似度测量
原标题:Creating a similarity measure that is weighted by column range

考虑以下矩阵:

structure(list(X1 = c(1L, 2L, 3L, 4L, 2L, 5L), X2 = c(2L, 3L, 
4L, 5L, 3L, 6L), X3 = c(3L, 4L, 4L, 5L, 3L, 2L), X4 = c(2L, 4L, 
6L, 5L, 3L, 8L), X5 = c(1L, 3L, 2L, 4L, 6L, 4L)), .Names = c("X1", 
"X2", "X3", "X4", "X5"), class = "data.frame", row.names = c(NA, 
-6L))

每列对应一个被告,每一行载有被申请人指定用于某一特定对象的排名数,并注意排名的范围可能不同于被申请人。

我试图根据每列的宽度来设定一个相似度量,根据每列的宽度来加权距离。 < 坚固> 这是我迄今尝试过的:

m <- test
d <- dist(m, "manhattan", diag=FALSE, upper=TRUE)/nrow(m) 
maxmin <- max(m, na.rm=TRUE) - min(m,na.rm=TRUE)
WeightedAgreement <- as.matrix((-1 * d + maxmin) / maxmin)

按照这一尺度,X1和X3之间的距离为0.761(1.666 * - 1)+7)/7 = 0.761。

我公式的问题在于它使用的是表格中所有值的范围 -- 所以“ 最大值” 总是 7, 偏向于相似值的计算。 我需要在计算相似值时使用列的范围, 而不是表格。 第1和第3栏的最大值应该是 4 (5-1), X1 和 X3 之间的相似值应该是 0.583 。

最佳回答

如果我理解正确的话,我想你应该将 maxmin 定义如下:

maxmin <- outer(names(m), names(m),
                Vectorize(function(i,j) max(m[c(i,j)], na.rm = TRUE) -
                                        min(m[c(i,j)], na.rm = TRUE)))

#      [,1] [,2] [,3] [,4] [,5]
# [1,]    4    5    4    7    5
# [2,]    5    4    4    6    5
# [3,]    4    4    3    6    5
# [4,]    7    6    6    6    7
# [5,]    5    5    5    7    5
问题回答

好吧,还有一个替代解决方案。这里的代码是:

require(proxy)
m <- test
d <- dist(t(m), "manhattan", diag=FALSE, upper=TRUE)/nrow(m)
f <- function(x,y) max(x,y, na.rm=TRUE) - min(x,y, na.rm=TRUE)
maxmin <- dist(t(test), f, upper=TRUE, diag=TRUE)
RawAgreementWeighted <- as.matrix((-1 * d + maxmin) / maxmin)
diag(RawAgreementWeighted) <- 1

基本上,我不得不使用函数 f 创建一个最大最小值(maxmin)的距离矩阵。这只能使用包件“ 代理” 的函数“ 驱动” 来完成。





相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签