English 中文(简体)
What are good techniques for retrieving a list of keywords for the top news stories of the day
原标题:

I am working on an application where I would like to retrieve a list of the day s top news stories from some source (such as the BBC) and parse these for keywords that I can use against my own tag data. There are obviously lots of webservices and APIs out there - but what would you suggest as good routes to take.

One thing I was considering is periodically downloading the RSS feed of BBC News and parsing the content using the Yahoo term extractor. This seems like a good solution to me, but the term extractor is for non-commercial use only and my application is commercial.

YQL looks promising but I m not sure how easy it will be to condense the data down to keywords.

All suggestions welcome, both for the news source and the keyword/tag extraction, and for both commercial and non-commercial uses.

Update:

Building on the suggestion of an answer, here s the YQL for grabbing the keywords from the top UK news stores on the BBC:

select content 
from search.termextract 
where context in (
    select title 
    from rss 
    where url= http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml  
) 

which returns something like:

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="46" yahoo:created="2009-11-13T11:49:05Z" yahoo:lang="en-US" yahoo:updated="2009-11-13T11:49:05Z" yahoo:uri="http://query.yahooapis.com/v1/yql?q=select+content+from+search.termextract+where+context+in+%28select+title+from+rss+where+url%3D%27http%3A%2F%2Fnewsrss.bbc.co.uk%2Frss%2Fnewsonline_uk_edition%2Ffront_page%2Frss.xml%27+%29">
    <results>
        <Result xmlns="urn:yahoo:cate">new york</Result>
        <Result xmlns="urn:yahoo:cate">bolt gun</Result>
        <Result xmlns="urn:yahoo:cate">stalker</Result>
        <Result xmlns="urn:yahoo:cate">russia</Result>
        <Result xmlns="urn:yahoo:cate">moon</Result>
        <Result xmlns="urn:yahoo:cate">hijack</Result>
        <Result xmlns="urn:yahoo:cate">yacht</Result>
        <Result xmlns="urn:yahoo:cate">balloon</Result>
        <Result xmlns="urn:yahoo:cate">parents</Result>
        <Result xmlns="urn:yahoo:cate">bruce forsyth</Result>
        <Result xmlns="urn:yahoo:cate">flu</Result>

Ultimately though, I don t think I can use this within a commercial app though due to the restrictions on the term extraction service.

问题回答

You say YQL looks promising, so I m sure you ve investigated this already. You can use two YQL services together. search.termextract will give you the keywords from the query made with search.news

select * from search.termextract where context in (select abstract from search.news where query="election")

You d have to fiddle around to make the where part of the query specific to latest news.

From here: "The Term Extraction service is limited to 5,000 queries per IP address per day and to noncommercial use. See information on rate limiting."





相关问题
IIS 6.0 hangs when serving a web-service

I am having issues with one of our web-services. It works fine on my development machine (win XP) whether I host it as a separate application or using cassini from Visual studio. Once I deploy on the ...

ASP.net web services

I am using a web service which sets the Thread.CurrentPrincipal object while logging in and soon later when another webmethod of the same web service accesses Thread.CurrentPrincipal, its different/...

Unity Container Disposing and XML Web Service

I am registering some wrapers over un-managed objects in container. How can I dispose of them at the end of the container s lifetime? Please bear in mind I have an XML Web service.

SharePoint : web service permission error

I have a sharepoint site, and I am calling a standard sharepoint web service. I create the web service request like this : wsDws.Url = this.SiteAddress + @"/_vti_bin/Dws.asmx"; When I use ...

热门标签