English 中文(简体)
网络校对只有半时间运作
原标题:Web Scraping with Jsoup only functioning half the time

不久前,我与Java Jsoup图书馆一道,试图更好地了解网络报废(在网站上公布数据)。 但看来,我只设法将部分职能合并起来。 这个问题是否与我的法典有关,或者某些网站是否有可能采取措施阻止网络的拆解?

这里是所有魔鬼的阶级:

import java.io.IOException;
import org.jsoup.*;
import org.jsoup.nodes.Document;




public class HTMLParser {

private Document d;
private String url;
private String content;



    public HTMLParser(String url){
    this.url = url; 
     connect();
     parse();
     display();

    }


    private void connect(){ 
        try{
        d = Jsoup.connect(url).get();   
        }catch(IOException e){}
    }

    private void parse(){
        content = d.body().text();

    }

    private void display(){
        System.out.println(content);

    }

}
问题回答

如果网站有动态负荷数据,你也会遇到问题。 特别是在目前年龄段。 伊斯兰法院是否无视机器人。

理想的情况是,你需要放下这一页,而N已经报废。

This software apparently renders web pages: http://lobobrowser.org/java-browser.jsp And there s certainly an API, which might allow you to look into the webpage s structure.

如果没有Jsoup,你可以网络报废。

public class Trick {
public static void main(String[] args) {
String str;
URLConnection con;

/HAVE TO HAVE TRY CATCH HERE OR THROW IT

con =  new URL("ANY URL").openConnection();
Scanner scanner = new Scanner(con.getInputStream());
scanner.useDelimiter(INPUT ANY DELIMETER);
str = scanner.next();
scanner.close();



str = str.substring(content.indexOf("NAME OF CLASS OF ID") + INPUT A NUMBER 
WHICH SIGNIFIES HOW MANY INDEXES YOU WANT IT TO NOT CONSIDER STARTING FROM THE 
LEFT);
String wow = str.substring(0, content.indexOf("WHERE YOU WANT IT TO END OR STOP 
SCRAPING"));
System.out.println(wow);
str = str.substring(content.indexOf("WHERE YOU WANT IT TO END OR STOP 
SCRAPING"));
}
//System.out.println(wow);}}




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签