我想知道,从网页中提取信息的方法是否优于将超文本用于搜索。 ie:摘取影像评级。
I m currently using the IndyHttp components to get the page and i m using strUtils to parse the text but the content is limited.
我想知道,从网页中提取信息的方法是否优于将超文本用于搜索。 ie:摘取影像评级。
I m currently using the IndyHttp components to get the page and i m using strUtils to parse the text but the content is limited.
我发现,当处理好网站时,便捷的便捷reg是高度直观和简单的,监文银行是一个良好的网站。
For example the movie rating on the IMDB s movie HTML page is in a <DIV>
with class="star-box-giga-star"
. That s VERY easy to extract using a regular expression. The following regular expression will extract the movie rating from the raw HTML into capture group 1:
star-box-giga-star[^>]*>([^<]*)<
它不是 pre忙的,而是工作。 The regex look for the "star- Box-giga-star” category id, it look for the >
which ends the
<
之前捕获所有物品。 创造像这样的新ex,你应当使用一个能够检查元素的网络浏览器(例如Crome或OM)。 Chrome,你可以简单地看网页,就你想要掌握的那部分内容进行正确点击。 然后,看看一下可以用来制造良好ex的易于识别的因素。 在此情况下,<条码>“星-箱-星”条码>类别显然易于识别! 由于良好的网站使用CSS和CSS需要<代码>{>>
才能妥善处理这些内容,你通常没有问题在良好网站上找到这种可识别的内容。
加工
截至张贴时间,现场唯一的RSS馈赠是:
然而,你可以呼吁增加一个新的内容,与联系。
www.un.org/Depts/DGACM/index_spanish.htm RSS饲料加工资源:
在拆除网站时,你不能依靠信息的提供。 监文银行可以发现你的报废,并试图阻止你,或经常改变格式,使之更加困难。
因此,你应始终努力利用支持的APICOR RSS馈赠,或至少从网站获得许可,以汇总其数据,并确保你重新遵守这些术语。 通常,你们必须支付这种机会。 未经许可而设计一个网站,可以在两个法律领域(服务和知识产权)向您开放。
http://www.imdb.com
You may not use data mining, robots, screen scraping, or similar online data gathering and extraction tools on our website.
为了回答你的问题,更好的办法是利用网站提供的方法。 关于非商业用途,如果您遵守terms,将IMDB数据库直接上载,并利用该数据库的数据,而不是拆除其网站。 简便地更新了你的数据库,比拆除该数据库更能解决问题。 你们甚至可以把自己的网站转播。 备有单列表格。
使用超文本 假定将任何超文本转换成有效的XML,然后使用XML教区,可能使用XPATH或制定自己的法典(这是我的做法)。
所张贴的所有答复都很好地回答了你的一般性问题。 我通常遵循一项类似于Cosmin详述的战略。 我对我的大部分网络开采需求都使用双赢和reg。
但是,请允许我补充一下我对获取成像资格的具体质疑中的两点。 IMDBAPI。 COM公司提供交回json代码的盘问接口,这非常适合于这类搜索。
因此,获得成像评级的非常简单的指挥线方案是......
program imdbrating;
{$apptype console}
uses htmlutils;
function ExtractJsonParm(parm,h:string):string;
var r:integer;
begin
r:=pos( " +Parm+ ": ,h);
if r<>0 then
result:=copy(h,r+length(Parm)+4,pos( , ,copy(h,r+length(Parm)+4,length(h)))-2)
else
result:= N/A ;
end;
var h:string;
begin
h:=HttpGet( http://www.imdbapi.com/?t= + UrlEncode(ParamStr(1)));
writeln(ExtractJsonParm( Rating ,h));
end.
my delphi 2009 app has a basic translation system that uses GNUGetText. i had used some win API calls to prepare the fonts. i thought it was working correctly until recently when someone from Malta ...
Ok, I m going nuts here. For the last (almost) four years, I ve been putting up with some extremely bad behavior from my Delphi 5 IDE. Problems include: Seemingly random errors in coride50.bpl ...
i would like to know if there are any solution to do this. Does anyone? The big picture: I want to access data over the web, using my delphi thin clients. But i´would like to keep my server/service ...
In Delphi, there is a function StrToInt() that converts a string to an integer value; there is also IntToStr(), which does the reverse. These functions doesn t appear to be part of Oxygene, and I can ...
What is the equivalent of SetLength using Oxygene? I m trying to size an integer array. var listIndexes: array of integer; begin setLength(listIndexes,5); // doesn t work end;
How can I monitor or visualize memory fragmentation of a delphi application?
I have consistently had IDE problems in Delphi/C++Builder for years, with every version. I usually just reboot, recompile, etc. and move on. However, I keep seeing others say that the IDE is rock ...
I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...