Optimize Enumerable.Distinct<T>() for HashSet<T> #112630
-
SummaryThis proposal suggests an optimization for the While this proposed optimization cannot be generally applied to all types that implement Current ImplementationThe existing implementation of public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource>? comparer)
{
if (source == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
}
return new DistinctIterator<TSource>(source, comparer);
} Proposed ChangeModify Enumerable.Distinct() to recognize when source implements HashSet and doesn't supply a non-default IEqualityComparer, and return the argument directly, avoiding redundant operations when a custom comparer is not provided: public static IEnumerable<T> Distinct<T>(this IEnumerable<T> values, IEqualityComparer<T> comparer = null)
{
if(values == null)
throw new ArgumentNullException(nameof(values));
if(values is ISet<T>)
{
if(values is HashSet<T> hashset &&
hashset.Comparer == (comparer ?? HashSetComparer<T>.Default))
{
return hashset;
}
if(values is ImmutableHashSet<T> hashSet2 &&
hashSet2.KeyComparer == (comparer ?? ImmutableHashSet<T>.Empty.KeyComparer))
{
return hashSet2;
}
/// Anything else?
}
return new DistinctIterator<TSource>(source, comparer);
}
// Avoid redundant allocations:
static class HashSetComparer<T>
{
// HashSet<T> deviates when T is string:
public static readonly IEqualityComparer<T> Default = new HashSet<T>().Comparer;
} Benefits
Potential ConsiderationsIf a custom IEqualityComparer is provided and it is not the same comparer used by the HashSet, the method should still apply the distinct operation instead of returning the HashSet as-is. ConclusionThis small change enhances performance and efficiency in scenarios where Enumerable.Distinct() is called on an existing HashSet without a custom comparer. By preventing redundant work, it aligns with .NET’s performance-oriented principles and provides a better developer experience. We welcome feedback on the feasibility and potential edge cases of this enhancement. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 13 replies
-
It can't do that. The ISet itself may be using a non-default notion of equality, which means there could be duplicates according to var set = new HashSet<int>(EqualityComparer<int>.Create((x, y) => false, x => 0))
{
1, 1, 1, 1, 1
};
foreach (int i in set.Distinct())
{
Console.WriteLine(i);
} but with the proposed change, it would print out five |
Beta Was this translation helpful? Give feedback.
-
That's true since public static IEnumerable<T> Distinct<T>(this IEnumerable<T> values, IEqualityComparer<T> comparer = null)
{
if(values == null)
throw new ArgumentNullException(nameof(values));
if(values is HashSet<T> hashset &&
hashset.Comparer == (comparer ?? HashSetComparer<T>.Default))
{
return hashset;
}
return new DistinctIterator<TSource>(source, comparer);
}
static class HashSetComparer<T>
{
public static readonly IEqualityComparer<T> Default = new HashSet<T>().Comparer;
} |
Beta Was this translation helpful? Give feedback.
They certainly can be. Many of them avoid significant amounts of work, and even ones that are smaller matter much more because they're much, much, much more common.
Demonstrate that this special-casing would be valuable en mass, and we can consider it. In the meantime, you can do it yourself in your own impls for your own use.