Question

请允许我说,Im试图拆解诸如h2的元件,其中既有“Transcript”,也有 transcript本。如果I m不错,p中“低于”h2的内容因此,以下两种解决办法均未奏效:

# using rvest

t %>% 
  html_elements( #transcript ) %>% 
  html_children()

t %>% 
  html_elements( #transcript p )

因此,我怎样才能获得这些<代码>p内容?

我试图寻找以前的SO智慧,并且只找到了美化用户提出的类似问题(实物捐助)。然而,这似乎是一个基本问题,因此,也许我比我想象的更离不开基础。

Answer 1

你们是否做了这项工作? 见解释性评论。

library(rvest)
library(xml2)

#read the page
url <- "https://80000hours.org/podcast/episodes/kevin-esvelt-stealth-wildfire-pandemics/"
page <- read_html(url)

#find the h2 elements
h2_elements <- page %>% html_elements( h2 )
h2_text <- h2_elements %>% html_text()

#select the node with the word "Transcript
desired_h2 <- h2_elements[grep("Transcript", h2_text)]

#find the parent node of the desired h2
parent <- xml_parent(desired_h2)

#find all of the child "p" nodes under the parent
answer <- parent %>% html_elements("p") %>% html_text()

head(answer, 5)

[1] "Table of Contents"                                                                                                                                                                                                                                                                                                                                                            
[2] "Kevin Esvelt: So scientists correctly appreciate that, when there is controversy, you can get a paper in Nature, Science, or Cell — the top journals which are the best for your career."                                                                                                                                                                                     
[3] "Therefore, the incentives favour scientists identifying pandemic-capable viruses and determining whether posited cataclysmically destructive viruses and other forms of attack would actually function."                                                                                                                                                                      
[4] "And I have not seen any appreciable counter-incentives that could be anywhere near as powerful as the ones favouring our desire to know. Because almost all the time, it is better for us to know."                                                                                                                                                                           
[5] "So I don’t see many plausible futures in which we do not learn how to build agents that would bring down civilisation today. We just know that in the limit, if you get good enough at programming biology, we can do anything t

友情链接