English 中文(简体)
图片数量,使用“len()”
原标题:the number of images, using "len()"

我需要计算图像数量(就本案而言,图1)。 明显使用“透镜”?

这是HTML:

<div class="detail-headline">
    Fotogal&#233;ria
        </div>
<div class="detail-indent">
    <table id="ctl00_ctl00_ctl00_containerHolder_mainContentHolder_innnerContentHolder_ZakazkaControl_ZakazkaObrazky1_ObrazkyDataList" cellspacing="0" border="0" style="width:100%;border-collapse:collapse;">
    <tr>
        <td align="center" style="width:25%;">
            <div id="ctl00_ctl00_ctl00_containerHolder_mainContentHolder_innnerContentHolder_ZakazkaControl_ZakazkaObrazky1_ObrazkyDataList_ctl02_PictureContainer">
                <a title="1-izb. Kaspická" class="highslide detail-img-link" onclick="return hs.expand(this);" href="/imgcache/cache231/3186-000393~8621457~640x480.jpg"><img src="/imgcache/cache231/3186-000393~8621457~120x120.jpg" class="detail-img" width="89" height="120" alt="1-izb. Kaspická" /></a>
            </div>
        </td><td></td>
    </tr>
</table>
</div>

I used before HTMLParser and the number of images must be added to "self.srcData".. Previous code:

def handle_starttag(self, tag, attrs):  
    if tag ==  div  and len(attrs) > 1 and attrs[1][0] ==  class  and attrs[1][1] ==  detail-headline  
      and self.srcData[self.getpos()[0]].strip() == u Realitn&#225; kancel&#225;ria :
      self.status = 2

    if self.status == 2 and tag ==  div  and len(attrs) > 0 and attrs[0][0] ==  class  and attrs[0][1] ==  name :
      self.record[-1] = decode(self.srcData[self.getpos()[0]].strip())
      self.status = 0

那么(检查起始标记)..像这样吗?

if tag ==  div  and len(attrs) > 0 and attrs[0][0] ==  class  and attrs[0][1] ==  detail-headline  
      and self.srcData[self.getpos()[0]].strip() ==  Fotogal&#233;ria :
      self.status = 3

可以吗?还有呢?谢谢。


import urllib
import urllib2
import HTMLParser
import codecs
import time
from BeautifulSoup import BeautifulSoup

# decode string
def decode(istr):
  ostr = u  
  idx = 0
  while idx < len(istr):
    add = True
    if istr[idx] ==  &  and len(istr) > idx + 1 and istr[idx + 1] ==  # :
      iend = istr.find( ; , idx)
      if iend > idx:
        ostr += unichr(int(istr[idx + 2:iend]))
        idx = iend
        add = False
    if add:
      ostr += istr[idx]
    idx += 1
  return ostr

# parser 1
class FlatDetailParser (HTMLParser.HTMLParser):
  def __init__ (self):
    HTMLParser.HTMLParser.__init__(self)

  def loadDetails(self, link):
    self.record = (len(self.characts) + 1) * [  ]
    self.status = 0
    self.index = -1
    self.reset()
    request = urllib2.Request(link)
    data = urllib2.urlopen(request)  # URL obtained from the next class
    self.srcData = []
    for line in data:
      line = line.decode( utf8 )
      self.srcData.append(line)
    for line in self.srcData:
      self.feed(line)
    self.close()
    return self.record


  def handle_starttag(self, tag, attrs):
    if tag ==  div  and len(attrs) > 1 and attrs[1][0] ==  class  and attrs[1][1] ==  detail-headline  
      and self.srcData[self.getpos()[0]].strip() == u Realitn&#225; kancel&#225;ria :
      self.status = 2

    if self.status == 2 and tag ==  div  and len(attrs) > 0 and attrs[0][0] ==  class  
      and attrs[0][1] ==  name :
      self.record[-1] = decode(self.srcData[self.getpos()[0]].strip())
      self.status = 0

下一个解析器类,并将数据添加到txt文件中。

When I use BeautifulSoup.. What is soup=BeautifulSoup(???). How can I add to srcData? This can be combined? How?

问题回答

就为了好玩,我尝试了一种pyparsing方法。Pyparsing包括一些帮助构建HTML标签匹配模式的方法,其中包括匹配属性、意外的空格、单引号或双引号以及其他难以预测的HTML标记错误。这里是一个pyparsing解决方案(假设您的HTML源代码已被读入字符串变量 html):

from pyparsing import makeHTMLTags

# makeHTMLTags returns patterns for both opening and closing 
# tags, we just want the opening ones
aTag = makeHTMLTags("A")[0]
imgTag = makeHTMLTags("IMG")[0]

# find the matching tags
tagMatches = (aTag|imgTag).searchString(html)

# yes, use len() to see how many there are
print len(tagMatches)

# get the actual image names
for t in tagMatches:
    if t.startA:
        print t.href
    if t.startImg:
        print t.src

印刷:

2
/imgcache/cache231/3186-000393~8621457~640x480.jpg
/imgcache/cache231/3186-000393~8621457~120x120.jpg




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签