English 中文(简体)
SAS/IML 中计算余弦相似性
原标题:Calculate cosine similarity in SAS/IML
  • 时间:2012-05-24 16:14:39
  •  标签:
  • sas

我在SAS IML中有一个矩阵。 对于每对行(比如矢量 A B ),我想计算cosine 相似性 ,

A.B/(AxB)

因此,结果应该是一个平方矩阵,其行数与最初的矩阵相同。

如果我将矢量传递到 Euclid 函数, 我就会得到一个矢量, 因此该函数似乎在矢量的每个元素上单独运行。 事实上, SAS 文件 < a href=" http://support. sas.com/ documentation/ cdl/en/ imlug/64248/ HTML/default/viewer.htm#imlug_lanngref_ sect340. htm" rel=“ nofolpol's :

If you call a Base SAS function with a matrix argument, the function will usually act elementwise on each element of teh [sic] matrix.

这很奇怪,为什么有人想要计算矢量每个元素的汇总统计? 他们总是会返回元素。 是否有办法为矢量获得 Euclidean 规范?

尽管欧几里德规范,是否有更有效的方法做到这一点?

proc iml;
 use fundstr;
 read all var _all_ into wgts;

 nrows=nrow(wgts);
 d=j(nrows,nrows,0);

 do i = 1 to nrows;
  do j = i to nrows;

  tmp = wgts[i,]*wgts[j,]`; /** need to divide by norms each vector **/
  d[i,j] = tmp;
  d[j,i] = tmp;

   end;
 end;
quit;
最佳回答

使用矩阵操作,并将此问题视为(A/ A * (B/ B ) 。

The first step is to divide each row by its Euclidean norm, which is just sqrt(ssq(wgts[i,])). You can use the "sum of squares" subscript reduction operator (##) to compute this for all rows at once without writing a loop: sqrt(wgts[ ,##]); (See http://blogs.sas.com/content/iml/2012/05/23/compute-statistics-for-each-row-by-using-subscript-operators/ for an explanation and examples of subscript reduction operators.)

行的对相点产品相当于矩阵乘法A*A`,A是缩放矩阵。

wgts = ranuni(j(5,5));         
norm = sqrt(wgts[ ,##]); /* Euclidean norm */
A = wgts/norm; 
d = A*A`;
print d;

如果您想将它与使用循环的( 无效的) 解决方案进行比较, 请在此 :

nrows=nrow(wgts);
d=j(nrows,nrows,0);
do i = 1 to nrows;
   normi = sqrt(wgts[i,##]);
   do j = i to nrows;
      normj = sqrt(wgts[j,##]);
      tmp = wgts[i,]*wgts[j,]` / (normi * normj);
      d[i,j] = tmp;
      d[j,i] = tmp;
   end;
 end;
 print d;

顺便说一句,你会很高兴听到在SAS/IML下一期发行时,

问题回答




相关问题
SAS stack overflow: PROC SQL reading dictionary.columns

I have a program in which I am reading dictionary.columns. There is a big program with lot of code before and after the program segment in which I read dictionary.column. The program used to work ...

SAS using encrypted (PWENCODE) in EMAILPW= option

My code works fine using plain text code, but fails when I use an encrypted password filename File email emailsys = VIM emailid= "&pa_usr" emailpw= "{sasenc}39AAD23E148A9555508AC84447181DFF" ; ...

How do I change the label in a data step header?

In SAS you can do. data a(rename=(a=b) ); a = 1; run; to rename a variable in the data step data statement (or data step header as I call it). What s the syntax to change the label? I tried ...

What s the easiest way to use SQLite with SAS?

I want to investigate how to access SQLite DB from SAS. What s the easiest way of doing this? Is there a SAS product that we can license to do that? I don t want to use ODBC drivers as that seems to ...

Computing Compounded Return in SAS

I have a dataset of date(monthly), person and return(monthly). I need to calculate the compounded monthly return of the dataset from April Year t to March Year t+1 for each person. For example, ...

Exchange Server and SAS 9.1.3/9.2

has anyone successfully interacted with a microsoft exchange server in SAS 9.1.3 or 9.2? i know it can be done with SAS Ent. Guide 4.x, but i m not interested in that route if regular SAS can do it.

Logging SAS scripts

I ve been developing a lot of Java, PHP and Python. All of which offer great logging packages (Log4J, Log or logging respectively). This is a great help when debugging applications. Especially if the ...

热门标签