English 中文(简体)
YQL scrape entire website/domain
原标题:

I m trying to scape back a set of links and content from a domain.

The Query in google would be

"site:www.newswebsite.com search_term"

I ve seen some close stuff to getting this working, but I can t seem to quite get a search working across a whole website, and then filter by the search term.

Is this possible without a custom data table?

最佳回答

I got to the bottom of it in the end.

select title,abstract,url,date from search.web(0) where query="search_term" and sites="www.website1.com,www.website2.com,www.website3.com" | sort (field= date ) | reverse()

This searches 3 sites, orders by date, and newest first. There is an alternate way to reverse the sort, but this seems to work for now. I think it s descending=true within the sort (field= date ,descending= true )

Very useful, even if I do say so myself.

问题回答

Christian Heilmann just wrote a fairly nice writeup on YQL and getting information back from an HTML datasource on the 24ways website.





相关问题
Best screen scraper, simple html dom or snoopy?

which one is better for screen scraping? simple html dom or snoopy ?? i use simple html dom and find it comfortable.. does snoopy has any advantage over simple html dom? my requirements : if i wanna ...

Is there anyway to scrape flash in this format?

is it possible to scrape this applet http://www.text118118.com/livefeed.aspx Its not possible to do it traditionally as the text is within the applet however is it possible to do it with a macro. ...

Scraping from wsj.com or finance.yahoo.com

I want to display on a wordpress page the total volume of shares traded on the NYSE stock exchange the last 2 weeks that it s been open. What is the best way to go about doing this?

Screen Scraping in PHP with login

Looking around for a solution to this, I have found different methods. Some use regex, some use DOM scripting or something. I want to go to a site, log in, fill out a form and then check if the form ...

How to retrieve a directory of files from a remote server?

If I have a directory on a remote web server that allows directory browsing, how would I go about to fetch all those files listed there from my other web server? I know I can use urllib2.urlopen to ...

热门标签