English 中文(简体)
Java: Equalator? (removing duplicates from a collection of objects)
原标题:

I have a bunch of objects of a class Puzzle. I have overridden equals() and hashCode(). When it comes time to present the solutions to the user, I d like to filter out all the Puzzles that are "similar" (by the standard I have defined), so the user only sees one of each.

Similarity is transitive.

Example:

Result of computations:
A    (similar to A)
B    (similar to C)
C
D

In this case, only A or D and B or C would be presented to the user - but not two similar Puzzles. Two similar puzzles are equally valid. It is only important that they are not both shown to the user.

To accomplish this, I wanted to use an ADT that prohibits duplicates. However, I don t want to change the equals() and hashCode() methods to return a value about similarity instead. Is there some Equalator, like Comparator, that I can use in this case? Or is there another way I should be doing this?

The class I m working on is a Puzzle that maintains a grid of letters. (Like Scrabble.) If a Puzzle contains the same words, but is in a different orientation, it is considered to be similar. So the following to puzzle:

                                    (2, 2): A           
                                    (2, 1): C           
                                    (2, 0): T

Would be similar to:

                    (1, 2): A           
                    (1, 1): C           
                    (1, 0): T      
问题回答

Okay you have a way of measuring similarity between objects. That means they form a Metric Space.

The question is, is your space also a Euclidean space like normal three dimensional space, or integers or something like that? If it is, then you could use a binary space partition in however many dimensions you ve got.

(The question is, basically: is there a homomorphism between your objects and an n-dimensional real number vector? If so, then you can use techniques for measuring closeness of points in n-dimensional space.)

Now, if it s not a euclidean space then you ve got a bigger problem. An example of a non-euclidean space that programers might be most familiar with would be the Levenshtein Distance between to strings.

If your problem is similar to seeing how similar a string is to a list of already existing strings then I don t know of any algorithms that would do that without O(n2) time. Maybe there are some out there.


But another important question is: how much time do you have? How many objects? If you have time or if your data set is small enough that an O(n2) algorithm is practical, then you just have to iterate through your list of objects to see if it s below a certain threshold. If so, reject it.

Just overload AbstractCollection and replace the Add function. Use an ArrayList or whatever. Your code would look kind of like this

class SimilarityRejector<T> extends AbstractCollection<T>{
     ArrayList<T> base;
     double threshold;

    public SimilarityRejector(double threshold){
        base = new ArrayList<T>();
        this.threshold = threshold;
    }

    public void add(T t){
       boolean failed = false;
       for(T compare : base){
          if(similarityComparison(t,compare) < threshold) faled = true;
       }
       if(!failed) base.add(t);
     }

    public Iterator<T> iterator() {
        return base.iterator();
    }

    public int size() {
        return base.size();
    }
}

etc. Obviously T would need to be a subclass of some class that you can perform a comparison on. If you have a euclidean metric, then you can use a space partition, rather then going through every other item.

I d use a wrapper class that overrides equals and hashCode accordingly.

private static class Wrapper {
    public static final Puzzle puzzle;
    public Wrapper(Puzzle puzzle) { 
        this.puzzle = puzzle; 
    }
    @Override 
    public boolean equals(Object object) {
        // ...
    }
    @Override 
    public int hashCode() {
        // ...
    }
}

and then you wrap all your puzzles, put them in a map, and get them out again…

public Collection<Collection<Puzzle>> method(Collection<Puzzles> puzzles) {
    Map<Wrapper,<Collection<Puzzle>> map = new HashMap<Wrapper,<Collection<Puzzle>>();
    for (Puzzle each: puzzles) {
        Wrapper wrapper = new Wrapper(each);
        Collection<Puzzle> coll = map.get(wrapper);
        if (coll == null) map.put(wrapper, coll = new ArrayList<Puzzle>());
        coll.add(puzzle);
    }
    return map.values();
}
  1. Create a TreeSet using your Comparator
  2. Adds all elements into the set
  3. All duplicates are stripped out

Normally "similarity" is not a transitive relationship. So the first step would be to think of this in terms of equivalence rather than similarity. Equivalence is reflexive, symmetric and transitive.

Easy approach here is to define a puzzle wrapper whose equals() and hashCode() methods are implemented according to the equivalence relation in question.

Once you have that, drop the wrapped objects into a java.util.Set and that filters out duplicates.

IMHO, most elegant way was described by Gili (TreeSet with custom Comparator).

But if you like to make it by yourself, seems this easiest and clearest solution:

/**
 * Distinct input list values (cuts duplications)
 * @param items items to process
 * @param comparator comparator to recognize equal items
 * @return new collection with unique values
 */
public static <T> Collection<T> distinctItems(List<T> items, Comparator<T> comparator) {
    List<T> result = new ArrayList<>();

    for (int i = 0; i < items.size(); i++) {
        T item = items.get(i);

        boolean exists = false;
        for (int j = 0; j < result.size(); j++) {
            if (comparator.compare(result.get(j), item) == 0) {
                exists = true;
                break;
            }
        }

        if (!exists) {
            result.add(item);
        }
    }

    return result;
}




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签