Skip to content

Instantly share code, notes, and snippets.

@tivrfoa
Created February 23, 2025 10:37
Show Gist options
  • Save tivrfoa/514fc9d8c0ad5932963834a180da7397 to your computer and use it in GitHub Desktop.
Save tivrfoa/514fc9d8c0ad5932963834a180da7397 to your computer and use it in GitHub Desktop.
Prompting ChatGPT: Create a Java interview question where the solution needs to use a HashMap

Java Interview Question: Using HashMap for Frequency Counting

Question:

You are given a list of words, and your task is to find the top N most frequent words in the list. Implement a function that efficiently counts word occurrences and returns the top N words sorted by frequency.

Requirements:

  1. Efficiently count the occurrences of each word.
  2. Retrieve the top N words sorted by frequency.
  3. If two words have the same frequency, they should be sorted alphabetically.
  4. Assume the input is a large dataset, so efficiency matters.

Follow-up Questions:

  1. Why is HashMap a good choice for counting word occurrences?
  2. How would you sort the words efficiently based on frequency?
  3. What is the time complexity of your approach?
  4. How would you handle case sensitivity (e.g., "Java" and "java")?
  5. What if the input stream is too large to fit into memory? How would you modify your solution?

Would you like a sample implementation for this? πŸš€


prompt: yes

ChatGPT:

Here's a Java implementation that efficiently counts word occurrences using a HashMap and retrieves the top N most frequent words sorted by frequency and alphabetically when frequencies match.


Implementation:

import java.util.*;

public class WordFrequencyCounter {
    
    public static List<String> getTopNFrequentWords(String[] words, int N) {
        // Step 1: Count word occurrences using HashMap
        Map<String, Integer> wordCount = new HashMap<>();
        for (String word : words) {
            word = word.toLowerCase(); // Normalize case sensitivity
            wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
        }

        // Step 2: Use a PriorityQueue (Min-Heap) to keep track of top N words
        PriorityQueue<Map.Entry<String, Integer>> minHeap = new PriorityQueue<>(
            (a, b) -> a.getValue().equals(b.getValue()) ? 
                      a.getKey().compareTo(b.getKey()) : // Sort alphabetically if frequencies match
                      a.getValue() - b.getValue() // Sort by frequency (ascending order)
        );

        for (Map.Entry<String, Integer> entry : wordCount.entrySet()) {
            minHeap.offer(entry);
            if (minHeap.size() > N) {
                minHeap.poll(); // Remove the least frequent element
            }
        }

        // Step 3: Extract words from the heap (sorted order)
        List<String> result = new ArrayList<>();
        while (!minHeap.isEmpty()) {
            result.add(minHeap.poll().getKey());
        }
        Collections.reverse(result); // Convert to descending order

        return result;
    }

    public static void main(String[] args) {
        String[] words = {"java", "python", "java", "c", "python", "java", "c", "c++", "c++", "c++", "python"};
        int N = 3; // Get top 3 frequent words

        List<String> topWords = getTopNFrequentWords(words, N);
        System.out.println("Top " + N + " most frequent words: " + topWords);
    }
}

Explanation:

  1. Count occurrences using a HashMap<String, Integer>, where the key is the word, and the value is the frequency.
  2. Use a Min-Heap (PriorityQueue) to store the top N frequent words efficiently.
    • If the heap size exceeds N, we remove the least frequent element.
    • If two words have the same frequency, they are sorted alphabetically.
  3. Extract words from the heap and reverse the list to get them in descending order of frequency.

Example Output:

Top 3 most frequent words: [java, python, c++]

(Here, "java" appears 3 times, "python" appears 3 times, and "c++" appears 3 times, so they are sorted alphabetically.)


Time Complexity Analysis:

  1. Counting words (HashMap) β†’ O(N)
  2. Maintaining a Min-Heap of size N β†’ O(M log N), where M is the number of unique words.
  3. Extracting results from the heap β†’ O(N log N)

Overall, the approach runs in O(N + M log N) time, making it efficient for large datasets.


Follow-up Enhancements:

βœ… Handle streaming data (use a sliding window approach).
βœ… Optimize space (use a Trie if dealing with prefix-based searches).
βœ… Distributed processing (for very large datasets, use tools like Apache Spark).

Would you like me to tweak this for a specific use case? 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment