由于Gmail don t 提供了获取这一信息的任何信息,因此它像你想要做些什么。
Web scraping (also called Web
harvesting or Web data extraction) is
a computer software technique of
extracting information from websites
如前文所述,有多种方式这样做:
Human copy-and-paste: Sometimes even
the best Web-scraping technology can
not replace human’s manual examination
and copy-and-paste, and sometimes this
may be the only workable solution when
the websites for scraping explicitly
setup barriers to prevent machine
automation.
Text grepping and regular expression
matching: A simple yet powerful
approach to extract information from
Web pages can be based on the UNIX
grep command or regular expression
matching facilities of programming
languages (for instance Perl or
Python).
HTTP programming: Static and dynamic
Web pages can be retrieved by posting
HTTP requests to the remote Web server
using socket programming.
DOM parsing: By embedding a
full-fledged Web browser, such as the
Internet Explorer or the Mozilla Web
browser control, programs can retrieve
the dynamic contents generated by
client side scripts. These Web browser
controls also parse Web pages into a
DOM tree, based on which programs can
retrieve parts of the Web pages.
HTML parsers: Some semi-structured
data query languages, such as the XML
query language (XQL) and the
hyper-text query language (HTQL), can
be used to parse HTML pages and to
retrieve and transform Web content.
Web-scraping software: There are many
Web-scraping software available that
can be used to customize Web-scraping
solutions. These software may provide
a Web recording interface that removes
the necessity to manually write
Web-scraping codes, or some scripting
functions that can be used to extract
and transform Web content, and
database interfaces that can store the
scraped data in local databases.
Semantic annotation recognizing: The
Web pages may embrace metadata or
semantic markups/annotations which can
be made use of to locate specific data
snippets. If the annotations are
embedded in the pages, as Microformat
does, this technique can be viewed as
a special case of DOM parsing. In
another case, the annotations,
organized into a semantic layer2,
are stored and managed separated to
the Web pages, so the Web scrapers can
retrieve data schema and instructions
from this layer before scraping the
pages.
在我继续发言之前,请铭记 所涉法律问题。 我不知道它是否遵守电子邮件的规定,我建议先检查这些词语,然后才能向前推进。 您也可以最终成为黑名单,或遇到类似其他问题。
尽管如此,我还是说,就你而言,你需要某种类型的间谍和多功能 par子,以便把你想要的数据输入电子邮件。 这一工具的选择将取决于您的技术层面。
作为一个废墟,我要使用、Mechanize和。 借助PHP,你可以研究如下解决办法:。