English 中文(简体)
robots.txt to restrict search engines indexing specified keywords for privacy
原标题:

I have a large directory of individual names along with generic publicaly available and category specific information that I want indexed as much as possible in search engines. Listing these names on the site itself is not a concern to people but some don t want to be in search results when they "Google" themselves.

We want to continue listing these names within a page AND still index the page BUT not index specified names or keywords in search engines.

Can this be done page-by-page or would setting up two pages be a better work around:

Options available:

  • PHP can censor keywords if user-agent=robot/search engine
  • htaccess to restrict robots to non-censored content, but allowing to a second censored version
  • meta tags defining words not to index ?
  • JavaScript could hide keywords from robots but otherwise viewable
问题回答

I will go through the options and tell you some problems I can see:

PHP: If you don t mind trusting user agent this will work well. I am unsure how some search engines will react to different content being displayed for their bots.

htaccess: You would probably need to redirect the bot to a different page. You could use the url parameters but this would be no different then using a pure PHP solution. The bot would index the page it is redirected to and not the page you wish to visit. You may be able to use the rewrite engine to over come this.

meta tags: Even if you could use meta tags to get the bot to ignore certain words, it wouldn t guarantee that search engines won t ignore it since there is no set "standard" for meta tags. But that doesn t matter since I don t no of any way to get a bot to ignore certain words or phrases using meta tags.

JavaScript: No bot I have ever heard of executes (or even reads) JavaScript when looking at a page, so I don t see this working. You could display the content you want hidden to the users using JavaScript and bots won t be able to see it but neither will users who have JavaScript disabled.

I would go the PHP route.

You can tell robots to skip indexing particular page by adding ROBOTS meta:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

UPDATE: The ways to restrict indexing of particular words I can think of are:

  1. Use JS to add those to the page (see below).
  2. Add module to the server that would strip those words from the rendered page.

JavaScript could be something like this:

<p>
  <span id="secretWord">
    <SCRIPT TYPE="text/javascript">
    <!-- 
       document.write( you can protect the word by concating strings/having HEX codes etc )
    //-->
    </script>
  </span>
</p>

The server module is probably best option. In ASP.NET it should be fairly easy to do that. Not sure about PHP though.

What s not clear from your posting is whether you want to protect your names and keywords against Google, or against all search engines. Google is general well-behaved. You can use the ROBOTS meta tag to prevent that page from being indexed. But it won t prevent search engines that ignore the ROBOTS tags from indexing your site.

Other approaches you did not suggest:

  • Having the content of the page fetched with client-side JavaScript.
  • Force the user to execute a CAPTCHA before displaying the text. I recommend the reCAPTCHA package, which is easy to use.

Of all these, the reCAPTCHA approach is probably the best, as it will also protect against ilbehaved spiders. But it is the most onerous on your users.





相关问题
Browsing activities outside the visited website

Is there a way a Web site can learn something about your browsing activities outside of this Web site from an HTTP request sent to the Web site by your browser?

How to store private pictures and videos in Ruby on Rails

Here s a story: User A should be able to upload an image. User A should be able to set a privacy. ("Public" or "Private"). User B should not be able to access "Private" images of User A. I m ...

Application passwords and SQLite security

I have been searching on google for information regarding application passwords and SQLite security for some time, and nothing that I have found has really answered my questions. Here is what I am ...

Open Source Identity vs. Real Life Identity

I maintain 2 identities one for open source development - which doesn t really contain any personal information. I also have another identity obviously - my real one. This may be community wiki - but ...

How does a Robot respond privately to you in Google Wave?

Google Wave allows two or more participants to speak privately within a wave. When my robot is added to the wave, I recognize the WAVELET_SELF_ADDED event and call the method below. However, nothing ...

How Big a Security Risk are Browser Extensions?

One of the more powerful features of modern day browsers is the ability for software developers to write browser extensions to enhance, modify and tweak the pages visited by the user. As more of our ...

热门标签