Want to improve this question? Update the question so it s on-topic for Stack Overflow.
Closed 11 years ago.
I have a list of link which I want to get crawled. I would like to all other links the crawler
finds by himself to not be crawled.
Directions I looked into: create a robots.txt which will disallow all pages expect those that exist in my site map. I saw information about how to create such a file which states I can disallow parts of the site by:
Allow: /folder1/myfile.html
Disallow: /folder1/
But the links I do want crawled are not in a particular folder. I can make him a hugh file which is actually a site map, but that doesn t seem reasonable. What would you recommend?