Frequency Counting
Time-Space Tradeoffs
Tell me, did you see her before? Once? Twice? Enough to truly know her?
How can you check if two strings are scrambled versions of each other? Here is a classic problem. Before reading the rest of the chapter, try to solve it on your own.
Valid Anagram (LeetCode Link) : Given two strings s
and t
, return true
if t
is an anagram of s
, and false
otherwise. 💡 HintWill sorting help?💡 HintWhat if we could count the frequency of each character instead?
If you found the sorting approach, you likely ended up with an solution. To do better, we can use a Frequency Counting pattern. This is a classic time-space trade-off: use extra space (a hash map, set, or array) to solve the problem in linear time.
The goal is to answer one of the two questions during a single pass over a collection of data:
- "How many of item X have I seen?" (Frequency Counting)
- "Have I seen item X before?" (Uniqueness/Existence Check)
By pre-processing the data into a frequency map, you can answer these questions in time.
Implementation Techniques
The right tool depends on the constraints of the input data.
1. Using a Hash Map (Dictionary)
This is the most flexible approach. A hash map stores key-value pairs, making it perfect for counting occurrences of any hashable data type (numbers, strings etc).
- When to use: When the set of possible items is large, unknown, or non-integer (eg., counting word frequencies).
- Example:
frequency_map = {}
for item in items:
frequency_map[item] = frequency_map.get(item, 0) + 1
Map<Character, Integer> frequencyMap = new HashMap<>();
for (char item : items) {
frequencyMap.put(item, frequencyMap.getOrDefault(item, 0) + 1);
}
std::unordered_map<char, int> frequencyMap;
for (char item : items) {
frequencyMap[item]++;
}
2. Using a Hash Set
A hash set is used to track unique items. Think of it as a hash map where you only care about the keys, not the values.
- When to use: When you only need to know if an item exists or not, without counting.
- Example:
seen = set()
for item in items:
if item in seen:
# Item already seen
pass
else:
seen.add(item)
# First time seeing this item
Set<Character> seen = new HashSet<>();
for (char item : items) {
if (seen.contains(item)) {
// Item already seen
} else {
seen.add(item);
// First time seeing this item
}
}
std::unordered_set<char> seen;
for (char item : items) {
if (seen.count(item)) {
// Item already seen
} else {
seen.insert(item);
// First time seeing this item
}
}
3. Using an Fixed-Size Array
This is a highly efficient optimization for a specific scenario. If the items you are counting are integers within a small, known range (e.g., lower case letters 'a' to 'z'), you can use an array where the index represents the item.
- When to use: For character counts (e.g., 'a'-'z'), use an array of size 26. For counting ASCII characters, use an array of size 128 or 256.
- Example:
frequency_array = [0] * 26 # For 'a' to 'z'
for char in string:
frequency_array[ord(char) - ord('a')] += 1
int[] frequencyArray = new int[26]; // For 'a' to 'z'
for (char ch : s.toCharArray()) {
frequencyArray[ch - 'a']++;
}
int frequencyArray[26] = {0}; // For 'a' to 'z'
for (char ch : s) {
frequencyArray[ch - 'a']++;
}
4. Using a Bit Vector
This is a bit more advanced and often not expected in interviews, but it's a powerful technique for specific cases. A bit vector uses bits to represent the presence or absence of items. Each bit in an integer can represent whether a specific item exists. So a single 4-byte integer can represent the presence of 32 items.
- When to use: When the range of items is small and known (e.g., lowercase letters 'a' to 'z').
- Example:
bit_vector = 0 # Initialize a bit vector
for char in string:
index = ord(char) - ord('a')
bit_vector |= (1 << index) # Set the bit at the index
int bitVector = 0; // Initialize a bit vector
for (char ch : s.toCharArray()) {
int index = ch - 'a';
bitVector |= (1 << index); // Set the bit at the index
}
int bitVector = 0; // Initialize a bit vector
for (char ch : s) {
int index = ch - 'a';
bitVector |= (1 << index); // Set the bit at the index
}
Walkthrough: Solving "Valid Anagram"
Let's apply the Frequency Counting pattern to solve the Valid Anagram problem.
- First, we check if the strings have different lengths. If they do, they can't be anagrams, so we return false.
- We create an integer array
counts
of size 26, initialized to all zeros (You may use a hash map instead). - We loop through the first string s, incrementing the count for each character. For example, when we see 'c', we increment counts[2].
- We then loop through the second string t, decrementing the count for each character.
- During the second loop, if we ever find that a character's count in our array is already zero before we decrement (meaning t has more of that character than s), we can immediately return false.
- Finally, if the counts array has all zeros, we return true, indicating that the two strings are anagrams.
def isAnagram(s: str, t: str) -> bool:
if len(s) != len(t):
return False
counts = [0] * 26 # Frequency array for characters 'a' to 'z'
for char in s:
counts[ord(char) - ord('a')] += 1
for char in t:
index = ord(char) - ord('a')
counts[index] -= 1
if counts[index] < 0:
return False
return all(count == 0 for count in counts)
public boolean isAnagram(String s, String t) {
if (s.length() != t.length()) return false;
int[] counts = new int[26]; // Frequency array for characters 'a' to 'z'
for (char ch : s.toCharArray()) {
counts[ch - 'a']++;
}
for (char ch : t.toCharArray()) {
if (--counts[ch - 'a'] < 0) return false;
}
for (int count : counts) {
if (count != 0) return false; // Check if all counts are zero
}
return true;
}
bool isAnagram(string s, string t) {
if (s.length() != t.length()) return false;
int counts[26] = {0}; // Frequency array for characters 'a' to 'z'
for (char ch : s) {
counts[ch - 'a']++;
}
for (char ch : t) {
if (--counts[ch - 'a'] < 0) return false;
}
for (int count : counts) {
if (count != 0) return false; // Check if all counts are zero
}
return true;
}
Note: The last loop in the above example is not strictly necessary. If we reach the end of both strings without returning false, we can safely assume they are anagrams. (Think about why this is true!)
Complexity Analysis
Time Complexity: where N is the length of the strings. We iterate through each string once. This is a significant improvement over the sorting approach.
Space Complexity: While we do use extra space, the array's size is constant (26). It does not grow with the size of the input strings, so the space complexity is constant. If we had used a hash map, the space would be where k is the number of unique characters.
More Problems & Variations
Now that you've mastered the pattern, apply it to these other problems. Notice how a single core idea can solve many different-looking challenges.
- Contains Duplicate LeetCode Link 💡 HintThis is an existence problem. Which data structure is best when you only need to track if you've seen something before?
- Two Sum LeetCode Link 💡 HintAs you iterate, for each number, what value are you looking for to complete the sum? A hash map can store values and their indices for a fast lookup.
- Group Anagrams LeetCode Link 💡 HintHow can you create a unique "key" for each group of anagrams? The key could be a sorted string or a string representation of the character frequency map.
- Top K Frequent Elements LeetCode Link 💡 HintThis is a two-step problem. First, use frequency counting to get all counts. Then, how can you efficiently find the K elements with the highest counts? A min-heap is a classic tool for "Top K" problems, we will look at this in detail in an upcoming section.