The page I m looking at contains :
<div id= 1 > <p> text 1 <h1> text 2 </h1> text 3 <p> text 4 </p> </p> </div>
I want to get all the text in the div, except for the text that is in the <h>
.
(I want to get "text 1","text 3" and "text 4")
There may be a few <h>
elements, or none at all.
And there may be a few <p>
elements, even one inside the other, or none.
I thought to do this by getting all the html source of the div, and using a regex to remove the <h>
elements. But selenium.get_text does not return the html, just the text (all of it!).
I know I can use selenium.get_html_source
and then look for the element I need with a regex, but that looks like a waste since selenium knows how to find the element.
Does anyone have a better solution? Thanks :)