English 中文(简体)
开始启动 Python Webccracing
原标题:Beginner Python Webscraping
  • 时间:2022-02-07 10:31:45
  •  标签:
  • python

我是Python的初学者。我正在研究一个网络剪裁项目。在这个项目中,我想从Cambridge字典中找到一些词的含义和POS,然后把它们导出到优秀的字典中。

这是我的代码:

pip install bs4
pip install requests
from bs4 import BeautifulSoup
import requests
headers = {"User-Agent" : "xxxxxxx"}
r=requests.get( https://dictionary.cambridge.org/dictionary/english/happy , headers=headers)
soup = BeautifulSoup(r.text, html.parser )
POS = soup.find_all("span", class_="pos dpos")
print(POS)

result: [<span class="pos dpos" title="A word that describes a noun or pronoun .>adjective</span>, <span class="pos dpos" title="A word that describes a noun or pronoun. >adjective</span>]

结果是,我只想得到形容词。但我不知道该如何做到,有人能帮我吗?谢谢。

问题回答

首先关闭 : 从您的脚本中删除 < code> pip 安装 命令。 只需要安装一个库一次。 然后您可以通过导入来使用它, 正如您在行3和行4中所做的那样 。

You have used the command you re looking for in your code. It is the .text. Store your span inside a variable and then call it by varname.text.

同意另一个答案,你应该删除两行:

     pip install bs4
     pip install requests

您的问题在于变量 POS 是带有 2个“ span” 标记的列表。 您可以在列表中进行循环, 每次打印内容。 这样 :

    for div in POS: 
        print(div.text) 

这将打印两次“ 形容词 ”, 每个元素一次, 如果您只想要为特定的 div 打印它, 您需要通过索引访问它, 但是您可以再次调用“. text ” 来获取文本 。

您重新获得列表的原因是,当您调用“ 全部” 时, 通过一个类名, 您将得到一个列表返回, 因为类名并非 HTML 元素所独有 。

希望这能帮助:)

您可以尝试访问横线元素的文本内容, 然后使用列表理解或循环过滤非通缉字符 :

from bs4 import BeautifulSoup
import requests

headers = {"User-Agent": "xxxxxxx"}
r = requests.get( https://dictionary.cambridge.org/dictionary/english/happy , 
headers=headers)
soup = BeautifulSoup(r.text,  html.parser )
POS = soup.find_all("span", class_="pos dpos")

pos_list = [tag.text.strip() for tag in POS]
pos_list = [pos for pos in pos_list if pos.isalpha()]

print(pos_list)




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签