DSAHeaps (Priority Queues)Finding Top 'K' Elements

Finding Top 'K' Elements

Heaps (Priority Queues)

In life and in data, not everything deserves your attention. Focus only on the top K that matter.

Imagine you have a massive stream of real-time data, like user scores in an online game, and you need to display a leaderboard of the top 10 players. Sorting the entire dataset every second is incredibly inefficient. How can you find just the "Top K" elements without all the extra work?

Problem: Kth Largest Element in an Array (LeetCode Link): Given an integer array nums and an integer k, return the kth largest element in the array. 💡 Hint

The Brute-Force Approach

The simplest solution is to sort the entire array in descending order and then pick the element at index k-1. This works, but it has a time complexity of O(NlogN)O(N log N), which is overkill. We are doing much more work than necessary by sorting the entire array.

The Heap (Priority Queue) Solution

A heap is the perfect data structure for this job. It allows us to maintain a collection of the "Top K" elements we've seen so far in just O(logK)O(log K) time per element.

Note: In this chapter, we will focus on how to use a heap (in-built data structure). How heaps works is something that you can explore later. You can think of heap as a black box that always keeps the smallest (or largest) element at the top.

The core idea is to use a min-heap of size k.

  1. Iterate: Go through each number in the input array.
  2. Build the Heap:
    • If the heap currently has fewer than k elements, just add the number.
    • If the heap is full (size k), compare the current number to the smallest element in the heap (which is always at the top of a min-heap).
  3. Compare and Replace: If the current number is larger than the smallest number in the heap, it means the current number has a right to be in the "Top K". So, we remove the smallest element and add the current one.
  4. Final Result: After iterating through all the numbers, the min-heap will contain the k largest elements from the entire array. The kth largest is the smallest among them, which is the root of the min-heap.

This approach is far more efficient, with a time complexity of O(NlogK)O(N log K).

Walkthrough: Finding the Kth Largest Element

import heapq

def find_kth_largest(nums: list[int], k: int) -> int:
    # Python's heapq is a min-heap by default.
    min_heap = []
    for num in nums:
        if len(min_heap) < k:
            heapq.heappush(min_heap, num)
        elif num > min_heap[0]:
            # If the current number is larger than the smallest in the heap,
            # it deserves a spot in the top k.
            heapq.heapreplace(min_heap, num)
    
    # The root of the min-heap is the kth largest element.
    return min_heap[0]

More Problems & Variations

This core "Top K" pattern can be adapted to solve many related problems.

  1. Top K Frequent Elements (LeetCode Link): Find the k most frequent elements. 💡 Hint

  2. K Closest Points to Origin (LeetCode Link): Find the k points closest to the origin (0, 0). 💡 Hint

  3. Sort Characters By Frequency (LeetCode Link): Sort a string in decreasing order based on the frequency of characters. 💡 Hint

Bonus Point: Quickselect

For finding the "Kth Largest Element" specifically, there is an even more optimized algorithm called Quickselect. It's a modification of the QuickSort algorithm. Instead of recursing into both sides of the pivot, it only recurses into the side that contains the kth element. This gives it an average-case time complexity of O(N)O(N), which is better than the heap solution's O(NlogK)O(N log K). However, its worst-case complexity is O(N2)O(N^2), whereas the heap solution is a more reliable O(NlogK)O(N log K). In an interview, mentioning both and discussing the trade-offs is a sign of a strong candidate.