English 中文(简体)
方案确定物品清单(书籍、歌曲、电影等)的相对“居民”
原标题:Programmatically determine the relative "popularities" of a list of items (books, songs, movies, etc)

鉴于(say)歌曲清单,确定其相对“人口”的最佳方式是什么?

我的第一个想法是利用谷歌趋势。 该歌曲清单:

  1. Subterranean Homesick Blues
  2. Empire State of Mind
  3. California Gurls

http://www.google.com/trends?q=%22Subterranean+Homesick+Blues% 222C+%22empire+state+of+mind% 222C+%22C+%22California+Gurls%22&ctab=0&geo=all&date=mtd&sort=2” 谷歌 趋势报告: (为了了解公众的now,我把报告限制在最后30天之内)

http://s3.amazonaws.com/instagal/original/image001.png?1275516612

棉兰帝国比加利福尼亚州更受欢迎,而亚特兰大的家园蓝 Blue比后者少得多。

因此,这项工作做得很好,但当你的名字是100或1000个歌曲时,情况如何? 谷歌 趋势只会使你能够一度比较5个术语,这样,如果没有巨大的圆环,那么正确的办法是什么?

另一种选择是,只进行谷歌搜索,看看哪一种成果最大,但这实际上并没有衡量同样的情况。

问题回答

令人瞩目的问题——Britney Spears的一首歌曲,在被遗忘的2个月中可能非常受欢迎,而Elvis的另一种歌曲可能会持续30年。 你们如何从数量上区分两者? 我们知道,我们想想的是,持续的民众比“恐慌”更重要,但如何取得这一结果?

首先,我将围绕释放日期实现正常化——现在,次大陆家园蓝皮可能不受欢迎的人(不是在我的家中,但到1965年实现正常化可能产生不同的结果。

由于大多数歌曲的流行程度越高,然后下降,那么,如果他们平息,就可选择该地区。 人们可能认为,在这段时期内,这两个系列是固定的、与固定的、通常分配的。 现在,你只能进行测试,以确定手段是否不同。

为了确定两个时间序列之间的差别程度,可能限制不大,但我尚未进行过两次。

谁?

你可以寻找关于Twitter的项目,并看到提及该项目的次数多。 或者看看亚马孙,看看看有多少人对其进行了审查,以及他们给予什么评级。 推特和亚马孙都有APIC。

我肯定会处理谷歌的“限制”的图像。

一般来说,分类算法所用的比较功能是“整体”:

  • input: 2 elements
  • output: true/false

阁下:

  • input: 5 elements
  • output: relative weights of each element

因此,你只需要一线电话号码,才能要求APIC(通常需要O(Nlog N)电话比较功能)。

页: 1 您可以平行,但确实读到用户指南,与你获准提交的请求数目相近。

之后,一旦他们全部被“制成”,你就能够在当地有一种简单的做法。

为了适当收集这些数据,你将:

  • Shuffle your list
  • Pop the 5 first elements
  • Call the API
  • Insert them sorted in the result (use insertion sort here)
  • Pick up the median
  • Pop the 4 first elements (or less if less are available)
  • Call the API with the median and those 4 first
  • Go Back to Insert until your run out of elements

如果你的名单是1 000个歌曲,那么向APIC发出250个声音,那是太平常的。





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签