Question

I m developing a tool that needs to download a web page from 3rd party server, execute it as a browser would and then parse the HTML. What I struggle with is that the tool need to parse the HTML after all javascript is executed and DOM is modified. I m trying to use PhantomJS for this purpose and it works on small snippets of code (just a tiny html document with external javascript that adds some nodes to DOM) but when I do the same with a real site (http://www.dba.dk/) I m not getting the final HTML after all modifications done by the js code.

I really need help on this as I have been stuck with it for more than a week.

My PhantomJS code is simple:

if (phantom.state.length === 0) {
     if (phantom.args.length === 0) {
             console.log( Usage: test.js <some URL> );
             phantom.exit();
     } else {
             var address = phantom.args[0];
             phantom.state = Date.now().toString();
             phantom.viewportSize = { width: 1280, height: 800 };
             phantom.open(address);
     }
} else {
     var elapsed = Date.now() - new Date().setTime(phantom.state);
     if (phantom.loadStatus ===  success ) {
             if (!first_time) {
                     var first_time = true;
                     if (!document.addEventListener) {
                             console.log( Not SUPPORTED! );
                     }
                     phantom.render( result.png );
                     var markup = document.documentElement.innerHTML;
                     console.log(markup);
                     phantom.exit();
             }
     } else {
             console.log( FAIL to load the address );
             phantom.exit();
     }
}

the HTML dumped to the console doesn t contain content generated dynamically

Answer 1

The problem was in the Flash plugin. The pages were detecting its absense. Once it was loaded correctly the problem was gone

友情链接