Introduction
In .NET, collections are fundamental to data storage and manipulation. While List<T>
is a go-to choice for many developers, HashSet<T>
is a powerful alternative offering unique performance benefits. This blog post explores HashSet<T>
in .NET and illustrates its usage with C# examples. It also compares its performance with List<T>
.
Understanding HashSet<T>
A HashSet<T>
is a collection that stores unique elements and provides high-performance set operations. It is based on the concept of a hash table, where it stores its elements based on their hash codes.
Key Features of HashSet<T>
- Uniqueness: Automatically ensures all elements in the set are unique.
- High Performance: Offers fast lookups, additions, and deletions.
- No Indexing: Unlike
List<T>
,HashSet<T>
does not support indexing.
Using HashSet<T> in C#
Here’s how you can use HashSet<T>
in various scenarios:
Basic Operations
HashSet<int> numbers = new HashSet<int>();
numbers.Add(1); // Adds an element
numbers.Add(2);
bool added = numbers.Add(1); // Returns false, as 1 is already in the set
numbers.Remove(2); // Removes an element
Checking for Existence
if (numbers.Contains(1))
{
Console.WriteLine("1 is in the set");
}
Union and Intersection
HashSet<int> set1 = new HashSet<int> { 1, 2, 3 };
HashSet<int> set2 = new HashSet<int> { 3, 4, 5 };
set1.UnionWith(set2); // set1 = { 1, 2, 3, 4, 5 }
set1.IntersectWith(set2); // set1 = { 3 }
Performance Comparison: HashSet<T> vs List<T>
When it comes to performance, HashSet<T>
and List<T>
serve different purposes and excel in different scenarios.
Lookup Performance
- HashSet<T>: Provides O(1) average time complexity for lookups, thanks to hash-based implementation.
- List<T>: Has O(n) time complexity for lookups, as it requires iterating through the list to find an element.
Insertion and Deletion
- HashSet<T>: Also offers O(1) average time complexity for insertions and deletions.
- List<T>: Insertions and deletions can be O(n) because elements may need to be shifted.
Memory Usage
- HashSet<T>: Generally uses more memory than
List<T>
, due to the structure required to store hash codes and handle collisions. - List<T>: More memory-efficient for a small number of elements.
Use Case Suitability
- HashSet<T>: Ideal for scenarios where you must ensure uniqueness and perform frequent lookups, insertions, and deletions.
- List<T>: Better suited for ordered collections where indexing is required, and you need to allow duplicate elements.
Comparison
List | HashSet | |
Allows duplicate items | Yes | No |
Add | O(n) | O(1) |
Remove | O(n) | O(1) |
Search | O(n) | O(1) |
Iterate | O(n) | O(n) |
Retrieve in sorted order | O(n log n) | O(n log n) |
This shows that performance of common methods in a list get slower as the number of items in them increases.
Conclusion
HashSet<T>
in .NET is a robust and high-performance alternative to List<T>
, particularly when dealing with unique elements and when performance in lookups, additions, and deletions is critical.
While it consumes more memory and lacks indexing capabilities, its efficiency in set operations makes it an invaluable tool in a developer’s toolkit.
Understanding when to use HashSet<T>
over List<T>
can significantly optimise the performance and scalability of your applications.
A couple of other useful posts on a similar vein include The foreach Loop Best Practices and Common Pitfalls and Enabling Foreach Functionality in C# Classes.
There is a great post here about the performance differences here: –
https://theburningmonk.com/2011/03/hashset-vs-list-vs-dictionary/