English 中文(简体)
disallow certain url in robots.txt [closed]
原标题:
  • 时间:2010-05-17 10:08:06
  •  标签:
  • robots.txt
Closed. This question is off-topic. It is not currently accepting answers.

Want to improve this question? Update the question so it s on-topic for Stack Overflow.

Closed 11 years ago.

We implemented a rating system on a site a while back that involves a link to a script. However, with the vast majority of ratings on the site at 3/5 and the ratings very even across 1-5 we re beginning to suspect that search engine crawlers etc. are getting through. The urls used look like this:

http://www.thesite.com/path/to/the/page/rate?uid=abcdefghijk&value=3

When we started we add the following to our robots.txt:

User-agent: *
Disallow: /rate

Is this incorrect or are googlebot and others simply ignoring our robots.txt?

最佳回答

You should use POST for actions which change things as search engine usually do not submit forms. Additionally, this will prevent users who download your website recursively (e.g. with wget) from submitting tons of votes.

Depending on your site, handling voting though javascript might be a solution, too.

Regarding your robots.txt: It has to be in the root path - i.e. http://www.thesite.com/robots.txt - and if your rating system is at /blah/rate you need to use Disallow: /blah/rate instead of Disallow: /rate

问题回答

Looks incorrect to me. You re only disallowing access to http://www.thesite.com/rate (and pages below it IIRC). Plus some crawlers ignore robots.txt!

Better to make it so that ratings are only ever altered in response to a POST, rather than a GET. Search engines never use POST.

User-agent: *
Disallow: /path/to/the/page/rate

You have to use the full path.

Might want to read up here a bit: http://www.javascriptkit.com/howto/robots.shtml





相关问题
disallow certain url in robots.txt [closed]

We implemented a rating system on a site a while back that involves a link to a script. However, with the vast majority of ratings on the site at 3/5 and the ratings very even across 1-5 we re ...

Multiple Sitemap: entries in robots.txt?

I have been searching around using Google but I can t find an answer to this question. A robots.txt file can contain the following line: Sitemap: http://www.mysite.com/sitemapindex.xml but is it ...

热门标签