HashSet<T> v List<T>

HashSet v List

Introduction

In .NET, collections are fundamental to data storage and manipulation. While List<T> is a go-to choice for many developers, HashSet<T> is a powerful alternative offering unique performance benefits. This blog post explores HashSet<T> in .NET and illustrates its usage with C# examples. It also compares its performance with List<T>.

Understanding HashSet<T>

A HashSet<T> is a collection that stores unique elements and provides high-performance set operations. It is based on the concept of a hash table, where it stores its elements based on their hash codes.

Key Features of HashSet<T>

  1. Uniqueness: Automatically ensures all elements in the set are unique.
  2. High Performance: Offers fast lookups, additions, and deletions.
  3. No Indexing: Unlike List<T>, HashSet<T> does not support indexing.

Using HashSet<T> in C#

Here’s how you can use HashSet<T> in various scenarios:

Basic Operations

HashSet<int> numbers = new HashSet<int>();
numbers.Add(1); // Adds an element
numbers.Add(2);
bool added = numbers.Add(1); // Returns false, as 1 is already in the set
numbers.Remove(2); // Removes an element

Checking for Existence

if (numbers.Contains(1))
{
    Console.WriteLine("1 is in the set");
}

Union and Intersection

HashSet<int> set1 = new HashSet<int> { 1, 2, 3 };
HashSet<int> set2 = new HashSet<int> { 3, 4, 5 };

set1.UnionWith(set2); // set1 = { 1, 2, 3, 4, 5 }
set1.IntersectWith(set2); // set1 = { 3 }

Performance Comparison: HashSet<T> vs List<T>

When it comes to performance, HashSet<T> and List<T> serve different purposes and excel in different scenarios.

Lookup Performance

  • HashSet<T>: Provides O(1) average time complexity for lookups, thanks to hash-based implementation.
  • List<T>: Has O(n) time complexity for lookups, as it requires iterating through the list to find an element.

Insertion and Deletion

  • HashSet<T>: Also offers O(1) average time complexity for insertions and deletions.
  • List<T>: Insertions and deletions can be O(n) because elements may need to be shifted.

Memory Usage

  • HashSet<T>: Generally uses more memory than List<T>, due to the structure required to store hash codes and handle collisions.
  • List<T>: More memory-efficient for a small number of elements.

Use Case Suitability

  • HashSet<T>: Ideal for scenarios where you must ensure uniqueness and perform frequent lookups, insertions, and deletions.
  • List<T>: Better suited for ordered collections where indexing is required, and you need to allow duplicate elements.

Comparison

ListHashSet
Allows duplicate itemsYesNo
AddO(n)O(1)
RemoveO(n)O(1)
SearchO(n)O(1)
IterateO(n)O(n)
Retrieve in sorted orderO(n log n)O(n log n)

This shows that performance of common methods in a list get slower as the number of items in them increases.

Conclusion

HashSet<T> in .NET is a robust and high-performance alternative to List<T>, particularly when dealing with unique elements and when performance in lookups, additions, and deletions is critical.

While it consumes more memory and lacks indexing capabilities, its efficiency in set operations makes it an invaluable tool in a developer’s toolkit.

Understanding when to use HashSet<T> over List<T> can significantly optimise the performance and scalability of your applications.

A couple of other useful posts on a similar vein include The foreach Loop Best Practices and Common Pitfalls and Enabling Foreach Functionality in C# Classes.

There is a great post here about the performance differences here: –

https://theburningmonk.com/2011/03/hashset-vs-list-vs-dictionary/

Stephen

Hi, my name is Stephen Finchett. I have been a software engineer for over 30 years and worked on complex, business critical, multi-user systems for all of my career. For the last 15 years, I have been concentrating on web based solutions using the Microsoft Stack including ASP.Net, C#, TypeScript, SQL Server and running everything at scale within Kubernetes.