English 中文(简体)
利用信使学会在一个网站上预测良好内容
原标题:Using scikit learn to predict good content on a website

我拥有一个网站一年的数据。 我愿培训一个机器学习算法,根据某些变量(如字数、张贴日等)预测新内容的成功。

我想取一新数据,就数据的某些特点提出意见,并有可能使网站能够做得更好。

此外,我还要继续向培训组补充未来数据,并不断培训算法,以便随着时间的推移而进行模拟。

我的问题是:我如何利用智慧来实现这一目标?

问题回答

什么是双重分类问题,即你必须决定某项投入是否好。

不同的回归算法、立克谢-列收入使转换算法变得十分容易,使你们能够看到什么奏效和什么。

从我头上看,我试图采取的一些方法:

  • SVM
  • Random forests (Forest of randomized trees in scikits)
  • Regression (Ridge, Lasso, IRLS, logistic)
  • Naive Bayes
  • k nearest neighbors

如何评估某种方法的质量? 使用交叉验证(如果你有足够数据,则有10倍,否则有5倍)。 该手册中有一节(5.1)。

Adding new data to the training set will require to retrain your model. Depending on the computing power you have at hand it may or may not be a problem. If you have a lot of examples, adding one won t change much, so be sure to re-train your algorithm with a handful of new examples. That will save computational time.

使用培训套的弹性算法称为离线算法。 另一方面,在线算法每当提出一个新的实例时就学习。 如果你实际需要,就象最近邻那样,尝试在线方法。

If you need example code, scikit-learn doc is very helpful: - http://scikit-learn.org/0.10/auto_examples/linear_model/logistic_l1_l2_sparsity.html#example-linear-model-logistic-l1-l2-sparsity-py - http://scikit-learn.org/0.10/modules/linear_model.html#ridge-regression

http://scikit-learn.org/0.10/user_guide.html





相关问题
Resample Filter of WEKA - How to interpret the result

I am currently strugeling with a machine learning problem whereas I have to deal with great unbalanced data sets. That is, there are six classes ( 1 , 2 ... 6 ). Unfortunately there are e.g. for class ...

How to recognize rectangles in this image?

I have a image with horizontal and vertical lines. In fact, this image is the BBC website converted to horizontal and vertical lines. My problem is that I want to be able to find all the rectangles in ...

Question About Using Weka, the machine learning tool

I m using the explorer feature of Weka for classification. So I have my .arff file, with 2 features of NUMERIC value, and my class is a binary 0 or 1 (eg {0,1}). Sample: @RELATION summary @...

Implementing a linear, binary SVM (support vector machine)

I want to implement a simple SVM classifier, in the case of high-dimensional binary data (text), for which I think a simple linear SVM is best. The reason for implementing it myself is basically that ...

libsvm model file format

According to this FAQ the model format in libsvm should be straightforward. And in fact it is, when I call just svm-train. As an example, the first SV for the a1a dataset is 1 3:1 11:1 14:1 19:1 39:...

Competitive Learning in Neural Networks

I am playing with some neural network simulations. I d like to get two neural networks sharing the input and output nodes (with other nodes being distinct and part of two different routes) to compete. ...

热门标签