DSATime-Space TradeoffsFrequency Counting

Frequency Counting

Time-Space Tradeoffs

Tell me, did you see her before? Once? Twice? Enough to truly know her?

How can you check if two strings are scrambled versions of each other? Here is a classic problem. Before reading the rest of the chapter, try to solve it on your own.

Valid Anagram (LeetCode Link) : Given two strings s and t, return true if t is an anagram of s, and false otherwise. 💡 Hint💡 Hint

If you found the sorting approach, you likely ended up with an O(NlogN)O(N logN) solution. To do better, we can use a Frequency Counting pattern. This is a classic time-space trade-off: use extra space (a hash map, set, or array) to solve the problem in linear time.

The goal is to answer one of the two questions during a single pass over a collection of data:

  1. "How many of item X have I seen?" (Frequency Counting)
  2. "Have I seen item X before?" (Uniqueness/Existence Check)

By pre-processing the data into a frequency map, you can answer these questions in O(1)O(1) time.

Implementation Techniques

The right tool depends on the constraints of the input data.

1. Using a Hash Map (Dictionary)

This is the most flexible approach. A hash map stores key-value pairs, making it perfect for counting occurrences of any hashable data type (numbers, strings etc).

  • When to use: When the set of possible items is large, unknown, or non-integer (eg., counting word frequencies).
  • Example:
frequency_map = {}
for item in items:
    frequency_map[item] = frequency_map.get(item, 0) + 1

2. Using a Hash Set

A hash set is used to track unique items. Think of it as a hash map where you only care about the keys, not the values.

  • When to use: When you only need to know if an item exists or not, without counting.
  • Example:
seen = set()
for item in items:
    if item in seen:
        # Item already seen
        pass
    else:
        seen.add(item)
        # First time seeing this item

3. Using an Fixed-Size Array

This is a highly efficient optimization for a specific scenario. If the items you are counting are integers within a small, known range (e.g., lower case letters 'a' to 'z'), you can use an array where the index represents the item.

  • When to use: For character counts (e.g., 'a'-'z'), use an array of size 26. For counting ASCII characters, use an array of size 128 or 256.
  • Example:
frequency_array = [0] * 26  # For 'a' to 'z'
for char in string:
    frequency_array[ord(char) - ord('a')] += 1

4. Using a Bit Vector

This is a bit more advanced and often not expected in interviews, but it's a powerful technique for specific cases. A bit vector uses bits to represent the presence or absence of items. Each bit in an integer can represent whether a specific item exists. So a single 4-byte integer can represent the presence of 32 items.

  • When to use: When the range of items is small and known (e.g., lowercase letters 'a' to 'z').
  • Example:
bit_vector = 0  # Initialize a bit vector
for char in string:
    index = ord(char) - ord('a')
    bit_vector |= (1 << index)  # Set the bit at the index

Walkthrough: Solving "Valid Anagram"

Let's apply the Frequency Counting pattern to solve the Valid Anagram problem.

String 's'
a
n
a
g
r
a
m
String ’t’
n
a
g
a
r
a
m
Frequency Map
.
  1. First, we check if the strings have different lengths. If they do, they can't be anagrams, so we return false.
  2. We create an integer array counts of size 26, initialized to all zeros (You may use a hash map instead).
  3. We loop through the first string s, incrementing the count for each character. For example, when we see 'c', we increment counts[2].
  4. We then loop through the second string t, decrementing the count for each character.
  5. During the second loop, if we ever find that a character's count in our array is already zero before we decrement (meaning t has more of that character than s), we can immediately return false.
  6. Finally, if the counts array has all zeros, we return true, indicating that the two strings are anagrams.
def isAnagram(s: str, t: str) -> bool:
    if len(s) != len(t):
        return False

    counts = [0] * 26  # Frequency array for characters 'a' to 'z'
    for char in s:
        counts[ord(char) - ord('a')] += 1
    for char in t:
        index = ord(char) - ord('a')
        counts[index] -= 1
        if counts[index] < 0:
            return False
    return all(count == 0 for count in counts)

Note: The last loop in the above example is not strictly necessary. If we reach the end of both strings without returning false, we can safely assume they are anagrams. (Think about why this is true!)

Complexity Analysis

Time Complexity: O(N)O(N) where N is the length of the strings. We iterate through each string once. This is a significant improvement over the O(NlogN)O(N logN) sorting approach.

Space Complexity: O(1)O(1) While we do use extra space, the array's size is constant (26). It does not grow with the size of the input strings, so the space complexity is constant. If we had used a hash map, the space would be O(k)O(k) where k is the number of unique characters.

More Problems & Variations

Now that you've mastered the pattern, apply it to these other problems. Notice how a single core idea can solve many different-looking challenges.

  1. Contains Duplicate LeetCode Link 💡 Hint
  2. Two Sum LeetCode Link 💡 Hint
  3. Group Anagrams LeetCode Link 💡 Hint
  4. Top K Frequent Elements LeetCode Link 💡 Hint