English 中文(简体)
LINQ query to detect duplicate properties in a list of objects
原标题:

I have a list of objects. These objects are made up of a custom class that basically contains two string fields String1 and String2.

What I need to know is if any of these strings are duplicated in that list. So I want to know if objectA.String1 == objectB.String1, or ObjectA.String2 == ObjectB.String2, or ObjectA.String1 == ObjectB.String", or ObjectA.String2 == ObjectB.String1.

Also, I want to mark each object that contains a duplicate string as having a duplicate string (with a bool HasDuplicate on the object).

So when the duplication detection has run I want to simply foreach over the list like so:

foreach (var item in duplicationList)
    if (item.HasDuplicate)
        Console.WriteLine("Duplicate detected!");

This seemd like a nice problem to solve with LINQ, but I cannot for the life of me figure out a good query. So I ve solved it using good-old foreach, but I m still interested in a LINQ version.

最佳回答

Here s a complete code sample which should work for your case.

class A
{
    public string Foo   { get; set; }
    public string Bar   { get; set; }
    public bool HasDupe { get; set; }
}

var list = new List<A> 
          { 
              new A{ Foo="abc", Bar="xyz"}, 
              new A{ Foo="def", Bar="ghi"}, 
              new A{ Foo="123", Bar="abc"}  
          };

var dupes = list.Where(a => list
          .Except(new List<A>{a})
          .Any(x => x.Foo == a.Foo || x.Bar == a.Bar || x.Foo == a.Bar || x.Bar == a.Foo))
          .ToList();

dupes.ForEach(a => a.HasDupe = true);
问题回答

This should work:

public class Foo
{
    public string Bar;
    public string Baz;
    public bool HasDuplicates;
}

public static void SetHasDuplicate(IEnumerable<Foo> foos)
{
    var dupes = foos
        .SelectMany(f => new[] { new { Foo = f, Str = f.Bar }, new { Foo = f, Str = f.Baz } })
        .Distinct() // Eliminates double entries where Foo.Bar == Foo.Baz
        .GroupBy(x => x.Str)
        .Where(g => g.Count() > 1)
        .SelectMany(g => g.Select(x => x.Foo))
        .Distinct()
        .ToList();

    dupes.ForEach(d => d.HasDuplicates = true);    
}

What you are basically doing is

  1. SelectMany : create a list of all the strings, with their accompanying Foo
  2. Distinct : Remove double entries for the same instance of Foo (Foo.Bar == Foo.Baz)
  3. GroupBy : Group by string
  4. Where : Filter the groups with more than one item in them. These contain the duplicates.
  5. SelectMany : Get the foos back from the groups.
  6. Distinct : Remove double occurrences of foo from the list.
  7. ForEach : Set the HasDuplicates property.

Some advantages of this solution over Winston Smith s solution are:

  1. Easier to extend to more string properties. Suppose there were 5 properties. In his solution, you would have to write 125 comparisons to check for duplicates (in the Any clause). In this solution, it s just a matter of adding the property in the first selectmany call.
  2. Performance should be much better for large lists. Winston s solution iterates over the list for each item in the list, while this solution only iterates over it once. (Winston s solution is O(n²) while this one is O(n)).

First, if your object doesn t have the HasDuplicate property yet, declare an extension method that implements HasDuplicateProperties:

public static bool HasDuplicateProperties<T>(this T instance)
    where T : SomeClass 
    // where is optional, but might be useful when you want to enforce
    // a base class/interface
{
    // use reflection or something else to determine wether this instance
    // has duplicate properties
    return false;
}

You can use that extension method in queries:

var itemsWithDuplicates = from item in duplicationList
                          where item.HasDuplicateProperties()
                          select item;

Same works with the normal property:

var itemsWithDuplicates = from item in duplicationList
                          where item.HasDuplicate
                          select item;

or

var itemsWithDuplicates = duplicationList.Where(x => x.HasDuplicateProperties());

Hat tip to https://stackoverflow.com/a/807816/492

var duplicates = duplicationList
                .GroupBy(l => l)
                .Where(g => g.Count() > 1)
                .Select(g => {foreach (var x in g)
                                 {x.HasDuplicate = true;}
                             return g;
                });

duplicates is a throwaway but it gets you there in less enumerations.

var dups = duplicationList.GroupBy(x => x).Where(y => y.Count() > 1).Select(y => y.Key);

foreach (var d in dups)
    Console.WriteLine(d);




相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...

热门标签