English 中文(简体)
web scrape all items in a page with selenium
原标题:

I am trying to bring all items in the webpage-for instance I would like to bring the first

"Hillebrand Boudewynsz. van der Aa (1661 - 1717)" and then all the other 49 in the page MY code is below, I am trying to use selenium and bring the items through xpath or CSS but I am not sure for the right path both options will be welcome this is the required sentence from the code #Finding element object<-remDr$findElement(using="xpath","/html/body/div[2]/div/ul/li[1]/a") #--------------------------------------------------------------------- and the website https://www.vondel.humanities.uva.nl/ecartico/persons/index.php?subtask=browse

rm(list=ls())
library(tidyverse)
#install.packages("robotstxt")
library( robotstxt)
#install.packages("RSelenium")
library(rvest)
library(RSelenium)
library(tidyverse)
#install.packages("netstat")
library(netstat)
library(wdman)
selenium()

# see path
selenium_object<-selenium(retcommand = T,check = F)

#binman::list_versions("chromedriver")

#start the server
remote_driver<-rsDriver(
  
  browser = "chrome",
  
  chromever = "113.0.5672.63",
  verbose = F,
  port = free_port()
)

# create a client object
remDr<-remote_driver$client

#open a browser
remDr$open()

# maximaize window size
remDr$maxWindowSize()

#navigate website
remDr$navigate("https://www.vondel.humanities.uva.nl/ecartico/persons/index.php?subtask=browse")

#Finding element
object<-remDr$findElement(using="xpath","/html/body/div[2]/div/ul/li[1]/a")
#---------------------------------------------------------------------
问题回答

That should work for you!

import csv
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

url = "https://www.vondel.humanities.uva.nl/ecartico/persons/index.php?subtask=browse"
all_data = []

# Selenium Configuration
chrome_options = Options()
chrome_options.add_argument("--headless")  # Running in headless mode
driver = webdriver.Chrome(options=chrome_options)
wait = WebDriverWait(driver, 10)

try:
    # Access to page
    driver.get(url)

    # Wait for products to load
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#setwidth > ul > li > a")))

    # Get the HTML content of the page
    html = driver.page_source
finally:
    driver.quit()  # Close the browser even if an exception occurs

# Extracting the necessary data using BeautifulSoup
soup = BeautifulSoup(html,  html.parser )
products = soup.select( #setwidth > ul > li > a )

for title in products:
    title_text = title.get_text(strip=True) if title else ""
    all_data.append([title_text])

# Write data to a CSV file
with open("vondel.csv", "w", newline="", encoding="utf-8") as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow([ Title ])
    writer.writerows(all_data)




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签