Published on

How to Approach Finding Top K Frequent Elements in Python

Authors
  • avatar
    Name
    Ina Lee
    Twitter

What does “Top K Frequent” mean?

Given an array of integers nums and an integer k,
the goal is to find the k elements that appear most frequently in the array.

For example:

nums = [1, 1, 1, 2, 2, 3]
k = 2

The result should be:

[1, 2]

Because 1 appears 3 times and 2 appears 2 times.

First thought: count everything

No matter which approach we use, it's unavoidable to count how many times each number appeared in nums. Python's dictionary is perfect for this.

from collections import defaultdict

freq = defaultdict(int)
for num in nums:
    freq[num] += 1

Now, freq should have each number as key and its frequency as value. It looks like this: Now freq looks like:

{
    1: 3,
    2: 2,
    3: 1
}

Solution 1: Sort by frequency (O(n log n))

The most straightforward idea is:

  1. Count frequencies
  2. Sort numbers by frequency
  3. Take the first k
from collections import defaultdict
from typing import List

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        freq = defaultdict(int)
        for num in nums:
            freq[num] += 1

        arr = sorted(freq.items(), key=lambda item: item[1], reverse=True)

        result = []
        for i in range(k):
            result.append(arr[i][0])
        return result

Why this works

  • freq.items() produces (number, frequency) pairs
  • key=lambda item: item[1] sorts by frequency
  • reverse=True sorts in descending order

Complexity

  • Time: O(n log n) due to sorting
  • Space: O(n) for the dictionary and array

Solution 2: Using a Heap (O(n + k log n))

Sorting everything is sometimes unnecessary.
We only care about the top k, not the full order.

This is where a heap helps.

Python’s heapq is a min-heap, so we store negative frequencies to simulate a max-heap.

import heapq
from collections import defaultdict
from typing import List

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        freq = defaultdict(int)
        for num in nums:
            freq[num] += 1

        heap = []
        for num, count in freq.items():
            heap.append((-count, num))

        heapq.heapify(heap)

        result = []
        for _ in range(k):
            result.append(heapq.heappop(heap)[1])

        return result

Key idea

  • Each heap element is (-frequency, number)
  • The smallest element in the heap actually represents the largest frequency
  • heappop() removes the most frequent element each time

Complexity

  • Time: O(n + k log n)
  • Space: O(n)

Solution 3: Bucket Sort (O(n))

There’s an even faster approach if we notice this:

The maximum possible frequency of any number is len(nums).

So we can create an array where:

  • index = frequency
  • value = list of numbers with that frequency
from collections import defaultdict
from typing import List

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        freq = defaultdict(int)
        for num in nums:
            freq[num] += 1

        buckets = [[] for _ in range(len(nums) + 1)]
        for num, count in freq.items():
            buckets[count].append(num)

        result = []
        for i in range(len(buckets) - 1, 0, -1):
            for num in buckets[i]:
                result.append(num)
                if len(result) == k:
                    return result

Why this works

  • Frequencies are bounded → perfect for bucket indexing
  • We iterate from highest frequency down
  • No sorting or heap needed

Complexity

  • Time: O(n)
  • Space: O(n)

Python syntax notes used in this problem

defaultdict(int)

  • Automatically initializes missing keys with 0
  • Ideal for frequency counting

sorted(freq.items(), key=lambda item: item[1], reverse=True)

  • freq.items() produces (number, frequency) tuples
  • lambda item: item[1] extracts the frequency from each tuple and uses it as the sorting key
  • reverse=True sorts the result in descending order

(-value, key) in heap

  • Python heap is a min-heap
  • Negating values simulates a max-heap

List initialization with comprehension

[[] for _ in range(n)]
  • Creates n independent empty lists, because a new list is created on each iteration
  • Modifying one sublist does not affect the others
  • Avoids the common bug from [[]] * n, where all elements reference the same list object

Backward iteration

for i in range(n - 1, 0, -1):
  • Iterates from large to small
  • Useful when higher indices has higher priority

Comparing the approaches

MethodTime ComplexitySpaceNotes
SortingO(n log n)O(n)Simple and readable
HeapO(n + k log n)O(n)Efficient when k is small
Bucket SortO(n)O(n)Fastest, but less intuitive

Final thoughts

  • Hashmaps are essential whenever counting is involved
  • Heaps are great when you need top / min / max repeatedly
  • Bucket sort is helpful when value ranges are limited
  • Understanding multiple solutions matters more than memorizing one!