English 中文(简体)
Web Scrape number in R?
原标题:Web Scrape Numbers in R?

In R, I am trying to webscrape the all working paper # (e.g, 31424, 31481, etc) of the following webpage:

https://www.nber.org/papers?facet=topics%3AFinancial%20Economics&page=1&perPage=50&sortBy=public_date

I trying to run the following code to get such:

url<-"https://www.nber.org/papers?facet=topics%3AFinancial%20Economics&page=1&perPage=50&sortBy=public_date"
page=read_html(url)
name=page%>%html_nodes(".paper-card__paper_number")%>%html_text() 

However, this code returns character(0), NOT giving me the working paper # s. Is there any way I can modify this code to get the working paper # s?

问题回答

To scrape dynamically generated content, you can use a headless browser automation tool like RSelenium, which allows you to control a real web browser programmatically. Here s how you can modify your code to achieve this:

1.First, make sure you have RSelenium and rvest installed:

install.packages("RSelenium")
install.packages("rvest")

2.Load the required libraries:

library(RSelenium)
library(rvest)

3.Start a Selenium server and open a browser:

driver <- rsDriver(browser="chrome", chromever="latest", port=4567L)
remDr <- driver[["client"]]

4.Navigate to the desired URL:

url <- "https://www.nber.org/papers?facet=topics%3AFinancial%20Economics&page=1&perPage=50&sortBy=public_date"
remDr$navigate(url)

5.Get the working paper numbers:

page_source <- remDr$getPageSource()[[1]]
page <- read_html(page_source)
name <- page %>% html_nodes(".paper-card__paper_number") %>% html_text()

6.Stop the Selenium server and close the browser:

remDr$close()
driver$server$stop()




相关问题
CSS working only in Firefox

I am trying to create a search text-field like on the Apple website. The HTML looks like this: <div class="frm-search"> <div> <input class="btn" type="image" src="http://www....

image changed but appears the same in browser

I m writing a php script to crop an image. The script overwrites the old image with the new one, but when I reload the page (which is supposed to pickup the new image) I still see the old one. ...

Firefox background image horizontal centering oddity

I am building some basic HTML code for a CMS. One of the page-related options in the CMS is "background image" and "stretch page width / height to background image width / height." so that with large ...

Separator line in ASP.NET

I d like to add a simple separator line in an aspx web form. Does anyone know how? It sounds easy enough, but still I can t manage to find how to do it.. 10x!

热门标签