English 中文(简体)
动态建筑的Avhur网页链接
原标题:Extract links from a dynamically constructed web page in Python

我正试图从一个动态建筑的网站(Namely 上提取具体内容。 虽然我对网络技术并不流利,但考虑到页源代码,它似乎是动态生成的。

因此,我理解一个网络驱动器,例如Selenium,将要求予以撤销,而不是简单要求。 (Which获得我制作这一页的代码,而不是我作为用户看待的实际结果)

然而,我不清楚究竟应该如何将ium用于此类案件。 我认为可以按班次进行搜索,我看到,我要求的最小类别是“有一要素的桌边上边的桌上边,而且根据它,一系列的冰箱,每个碎块打成 t子,我可以从这些浮点的纽克行动中获得档案名称。

我看到,表一要求有“格米”的标签,即“速成表格”的“

driver.find_element(By.CLASS_NAME,  table table-bordered table-hover  )

但我只是说没有发现。 既然确实存在,我就认为我错过了发现的——指挥。 如何正确使用?

注——我注意到,在这种具体情况下,我可以去主人——Hok.aspx,了解我需要什么,但我很想知道,我如何能够从我所看到的网页直接处理。

问题回答

你在表格中看到的数据与通过Java语的外部URL相隔。 • this

import requests
import pandas as pd


data_url =  https://www.kingstore.co.il/Food_Law/MainIO_Hok.aspx?_=1690053691921&WStore=&WDate=&WFileType=0 
data = requests.get(data_url).json()
df = pd.DataFrame(data)

df.pop( PathLogo )
df[ url ] =  https://www.kingstore.co.il/Food_Law/Download/  + df[ FileNm ]
print(df)

Prints:

                                   FileNm Company                                         Store TypeFile TypeExpFile          DateFile                                                                                   url
0  Price7290058108879-338-202307222101.gz       1  338 דוכאן חי אלוורוד                           מחירים          gz  21:01 22/07/2023  https://www.kingstore.co.il/Food_Law/Download/Price7290058108879-338-202307222101.gz
1  Price7290058108879-337-202307222101.gz       1  337 דוכאן אעבלין                               מחירים          gz  21:01 22/07/2023  https://www.kingstore.co.il/Food_Law/Download/Price7290058108879-337-202307222101.gz
2  Price7290058108879-336-202307222101.gz       1  336 דוכאן קלנסווה                              מחירים          gz  21:01 22/07/2023  https://www.kingstore.co.il/Food_Law/Download/Price7290058108879-336-202307222101.gz
3  Price7290058108879-335-202307222101.gz       1  335 דוכאן כפר ברא                              מחירים          gz  21:01 22/07/2023  https://www.kingstore.co.il/Food_Law/Download/Price7290058108879-335-202307222101.gz
4  Price7290058108879-334-202307222101.gz       1  334 דיר חנא זכיינות                            מחירים          gz  21:01 22/07/2023  https://www.kingstore.co.il/Food_Law/Download/Price7290058108879-334-202307222101.gz

...

https://www.kingstore.co.il/Food_Law/Main.aspx”rel=“nofollow noreferer”>website。 http://stackoverflow.com/a/59130336/7429447” http://stackoverflow.com/a/70733548/7429447“<>>>。 您可使用以下网站: 战略:

法典:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

driver.get("https://www.kingstore.co.il/Food_Law/Main.aspx")
time.sleep(5)
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.table.table-bordered.table-hover#myTable"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)
driver.quit()

Console Output:

[                                    שם קובץ                  סניף     סוג סיומת             תאריך      Unnamed: 5
0    Price7290058108879-338-202307230001.gz  338 דוכאן חי אלוורוד  מחירים    gz  00:01 23/07/2023  להורדה לחץ כאן
1    Price7290058108879-337-202307230001.gz      337 דוכאן אעבלין  מחירים    gz  00:01 23/07/2023  להורדה לחץ כאן
2    Price7290058108879-336-202307230001.gz     336 דוכאן קלנסווה  מחירים    gz  00:01 23/07/2023  להורדה לחץ כאן
3    Price7290058108879-335-202307230001.gz     335 דוכאן כפר ברא  מחירים    gz  00:01 23/07/2023  להורדה לחץ כאן
4    Price7290058108879-334-202307230001.gz   334 דיר חנא זכיינות  מחירים    gz  00:01 23/07/2023  להורדה לחץ כאן
..                                      ...                   ...     ...   ...               ...             ...
995  Price7290058108879-012-202307210401.gz               12 נצרת  מחירים    gz  04:01 21/07/2023  להורדה לחץ כאן
996  Price7290058108879-010-202307210401.gz      10 דליית אל כרמל  מחירים    gz  04:01 21/07/2023  להורדה לחץ כאן
997  Price7290058108879-008-202307210401.gz             8 באר שבע  מחירים    gz  04:01 21/07/2023  להורדה לחץ כאן
998  Price7290058108879-007-202307210401.gz               7 סכנין  מחירים    gz  04:01 21/07/2023  להורדה לחץ כאן
999  Price7290058108879-006-202307210401.gz               6 שפרעם  מחירים    gz  04:01 21/07/2023  להורדה לחץ כאן

[1000行×6栏]

当动态生成网页时,其内容从某个地方装上。 您可以通过网站浏览器的<代码>developer 工具功能来观察这一进程,通常通过下列关键内容检索:。 通过查阅<代码>network的栏目,您将能够看到网页上的<代码>requests/code>。 在分析这些请求后,你将注意到其中一份回复到填表所需的全部数据,即询问一份APIC。 The URL for this request is as follows: 。 https://www.kingstore.co.il/Food_Law/MainIO_Hok.aspx?_=1690053691921&WStore=&WDate=&WFileType=0

因此,为了获得所有数据,你只能直接要求这份年度清单。 通过简单的请求,你将能够检索所需数据。

import requests

website =  https://www.kingstore.co.il/Food_Law/MainIO_Hok.aspx?_=1690053691921&WStore=&WDate=&WFileType=0 
response = requests.get(website)

for item in response.json():
    print( FileNm: %s  % item[ FileNm ])
    print( Company: %s  % item[ Company ])
    print( Store: %s  % item[ Store ])
    print( TypeFile: %s  % item[ TypeFile ])
    print( TypeExpFile: %s % item[ TypeExpFile ])
    print( DateFile: %s  % item[ DateFile ])

产出:

FileNm: Price7290058108879-200-202307221601.gz
Company: 1
Store: 200 ירושליים                                
TypeFile: מחירים
TypeExpFile: gz
DateFile: 16:01 22/07/2023

FileNm: Price7290058108879-019-202307221601.gz
Company: 1
Store: 19 רמלה                                    
TypeFile: מחירים
TypeExpFile: gz
DateFile: 16:01 22/07/2023

FileNm: Price7290058108879-018-202307221601.gz
Company: 1
Store: 18 יפת - יפו תל אביב                       
TypeFile: מחירים
TypeExpFile: gz
DateFile: 16:01 22/07/2023

FileNm: Price7290058108879-017-202307221601.gz
Company: 1
Store: 17 יפיע                                    
TypeFile: מחירים
TypeExpFile: gz
DateFile: 16:01 22/07/2023

...

I suggest using the developer tools box, as it has been very helpful in my work. This tool is quite useful for obtaining selectors for elements I want to extract using web scraping or, as in this case, to identify the source from which content is loaded on dynamic web pages.





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签