English 中文(简体)
如何删除文本档案中的重复短语?
原标题:How to delete duplicate phrases in a text file?

Okay so I have about 1000 duplicated phrases in this file, so doing this manually is not an option. Note that these are PHRASES, not lines or words, and each "phrase" is about 10 lines long.

我试图删除重复短语,但唯一使“项目”(或短语)重复的是立场表。 例:

    class Item0
    {
        position[]={4347.6001,0,3214.6399};
        azimut=128.81599;
        special="NONE";
        id=1;
        side="EMPTY";
        vehicle="Land_fortified_nest_small";
        lock="UNLOCKED";
        skill=0.2;
        init="this setPos [4347.6, 3214.64, 0]; this setDir 128.816;";
    };
    class Item1
    {
        position[]={4347.6001,0,3214.6399};
        azimut=128.81599;
        special="NONE";
        id=2;
        side="EMPTY";
        vehicle="Land_fortified_nest_small";
        lock="UNLOCKED";
        skill=0.2;
        init="this setPos [4347.6, 3214.64, 0]; this setDir 128.816;";
    };

现在,前两个短语是重复的,但ID和ITEM#是不同的,因此,确定重复短语的唯一途径是通过位置[]={}参数。 当两个短语具有相同立场时,不论国际发展法还是国际投资法编号,这两句话都重复了。

因此,我的目标是使用某种类型的代码、文字、方案或定期表达,删除所有重复短语,但留下第一个重复的词语。 因此,如果存在三个重复,则留下一个短语,但删除两个。 我怎么做呢?


An example of the desired input/output:

Input:

    class Item0
    {
        position[]={4347.6001,0,3214.6399};
        azimut=128.81599;
        special="NONE";
        id=1;
        side="EMPTY";
        vehicle="Land_fortified_nest_small";
        lock="UNLOCKED";
        skill=0.2;
        init="this setPos [4347.6, 3214.64, 0]; this setDir 128.816;";
    };
        class Item1
    {
        position[]={4682.6001,0,3847.6399};
        azimut=128.81599;
        special="NONE";
        id=2;
        side="EMPTY";
        vehicle="Land_fortified_nest_small";
        lock="UNLOCKED";
        skill=0.2;
        init="this setPos [4682.6, 3847.64, 0]; this setDir 128.816;";
    };
        class Item2
    {
        position[]={4347.6001,0,3214.6399};
        azimut=128.81599;
        special="NONE";
        id=3;
        side="EMPTY";
        vehicle="Land_fortified_nest_small";
        lock="UNLOCKED";
        skill=0.2;
        init="this setPos [4347.6, 3214.64, 0]; this setDir 128.816;";
    };

Output:

    class Item0
    {
        position[]={4347.6001,0,3214.6399};
        azimut=128.81599;
        special="NONE";
        id=1;
        side="EMPTY";
        vehicle="Land_fortified_nest_small";
        lock="UNLOCKED";
        skill=0.2;
        init="this setPos [4347.6, 3214.64, 0]; this setDir 128.816;";
    };
        class Item1
    {
        position[]={4682.6001,0,3847.6399};
        azimut=128.81599;
        special="NONE";
        id=2;
        side="EMPTY";
        vehicle="Land_fortified_nest_small";
        lock="UNLOCKED";
        skill=0.2;
        init="this setPos [4682.6, 3847.64, 0]; this setDir 128.816;";
    };
问题回答

我的最初做法是:

  1. create an array to store the unique positions
  2. parse the file, if the position is in the array, skip. Else, output to file & store in array.
  3. Loop until EOF

这将给你带来希望,但不是最佳解决办法。 考虑如何储存某一项目的首次接触,以及你将如何在以后加以核对(同时扫描一个阵列)。

我将产生每一短语的哈希姆价值,并将其储存在地图上。 保留新的短语,如果已经存在的话,则忽视。 哈萨克的法典和地图价值是独一无二的,因此,你获得了重复。

如果是一类,那么你可以考虑使用一种特殊技术,并增加类别要素。

      Set<Item> itemSet  = new HashSet<Item>;
      itemSet.add(new Item());

在增加所有项目之前,你只剩下独特项目。

您可以避免争执,并通过检查是否插入该物品来检查所插入的婴儿。 这将考虑到这些身份证符合规定。 暂停使用新类别,该类别有相同的数据成员,不包括补贴。

我用了一个不同的样板(容易建造)希望它有助于

    int item[] = null;
    int offset = 0;
    int counter = 0;
    ArrayList<Integer> duplicateids = new ArrayList<Integer>();
    Set<Integer> afterDups= new HashSet<Integer>();
    for (int i : item) {
        counter++;
        //you can create a new class excluding the id and initialize it here
        if(!afterDups.add(i))
            duplicateids.add(counter);
    }

EDIT:

奥凯·我错失了从档案中抽取的东西,从而增加了这一答案。 你可以检查每一条线,鉴于你的档案是这样,你不想比较<代码>。 项目0和id=1;项目。 您可以按行文读到档案链条,将其放在地体内。 一旦完成一个班级(按“代码”栏目“<><>>>>>/代码”的起首标明),你就可以为案文另立一个插图。 页: 1 利用一名分离者,你可以再次把扼杀分开,重新整理档案。

public static void main(String args[])
{
    try{
        FileInputStream fstream = new FileInputStream("file.txt");
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;
        String seperator = "$$";
        //this contains the $$ seperated class data items
        String currentClassText = "";
        //this contains the $$ seperated class name the opening braces and the closing braces
        String  currentClassCredentilas= "";
        Set<String> texts = new HashSet<String>();
        ArrayList<String> credentials = new ArrayList<String>();
        while ((strLine = br.readLine()) != null)   {
            if(strLine.contains("id=") || strLine.contains("class") || strLine.contains("};"))
                currentClassCredentilas.concat(strLine + seperator);
            else
                currentClassText.concat(strLine + seperator);

            //check if the class has completed
            if(strLine.contains("};")){
                //text is not a duplicate
                if(texts.add(currentClassText)){
                    credentials.add(currentClassCredentilas + seperator);
                }
                //set everything back to empty for the next round
                currentClassCredentilas = currentClassText = "";
            }
            System.out.println (strLine);
        }
        in.close();
    }catch (Exception e){
        System.err.println("Error: " + e.getMessage());
    }
}




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签