English 中文(简体)
Is there a list of known web crawlers? [closed]
原标题:

We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

Closed 5 years ago.

I m trying to get accurate download numbers for some files on a web server. I look at the user agents and some are clearly bots or web crawlers, but many for many I m not sure, they may or may not be a web crawler and they are causing many downloads so it s important for me to know.

Is there somewhere a list of know web crawlers with some documentation like user agent, IPs, behavior, etc?

I m not interested in the official ones, like Google s, Yahoo s, or Microsoft s. Those are generally well behaved and self-indentified.

最佳回答

I m using http://www.user-agents.org/ usually as reference, hope this helps you out.

You can also try http://www.robotstxt.org/db.html or http://www.botsvsbrowsers.com.

问题回答

I m maintaining a list of crawler s user-agent patterns at https://github.com/monperrus/crawler-user-agents/.

It s collaborative, you can contribute to it with pull requests.

http://www.robotstxt.org/db.html is a good place to start. They have an automatable raw feed if you need that too. http://www.botsvsbrowsers.com/ is also helpful.

Unfortunately we ve found that bot activity is too numerous and varied to be able to accurately filter it. If you want accurate download counts, your best bet is to require javascript to trigger the download. That s basically the only thing that is going to reliably filter out the bots. It s also why all site traffic analytics engines these days are javascript based.





相关问题
Finding a class within list

I have a class (Node) which has a property of SubNodes which is a List of the Node class I have a list of Nodes (of which each Node may or may not have a list of SubNodes within itself) I need to be ...

How to flatten a List of different types in Scala?

I have 4 elements:List[List[Object]] (Objects are different in each element) that I want to zip so that I can have a List[List[obj1],List[obj2],List[obj3],List[obj4]] I tried to zip them and I ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

Is List<> better than DataSet for UI Layer in ASP.Net?

I want to get data from my data access layer into my business layer, then prepare it for use in my UI. So i wonder: is it better to read my data by DataReader and use it to fill a List<BLClasses&...

What is the benefit to using List<T> over IEnumerable<T>?

or the other way around? I use generic lists all the time. But I hear occasionally about IEnumerables, too, and I honestly have no clue (today) what they are for and why I should use them. So, at ...

灵活性:在滚动之前显示错误的清单

我有一份清单,在你滚动之前没有显示任何物品,然后这些物品就显示。 是否有任何人知道如何解决这一问题? 我尝试了叫人名单。

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签