LeetCode-Problems / 0295. Find Median from Data Stream / 0295. Find Median from Data Stream.py
0295. Find Median from Data Stream.py
Raw
# 11-12-2022 Leetcode 0295. Find Median from Data Stream
# https://leetcode.com/problems/find-median-from-data-stream/description/

# First instinct is just to use a minheap to keep it sorted
# and use DFS to return a sorted list each time
# we need to find the median. I dont think
# we can do much better than minheap for organizing...


# Hmm. Python docs even say for "large N" it is faster to just use
# sorted() to return the list of n largest or smallest elements. In
# our case we'll want all of them. Is that different?
# From stackoverflow: " it's impossible to print sorted sequence out of
# the heap faster than O(nlogn), because it'd mean it's possible to sort
# any sequence faster than O(nlogn)". As I expected.

# Ugh, we could go for a doubly linked list build and just traverse and
# insert. Insertions are much cheaper than reordering. For some reason I find
# myself always annoyed by linked lists. n insertions could be O(n^2) though right?

# ok... what about this. Two pointers, either spliting center or pointing
# to the center index of a deque. When we push a value we compare it to
# the two pointers. If smaller than both, appendleft. If smaller than left
# BUT NOT RIGHT, appendleft and sort just the left half. If larger than
# left BUT NOT RIGHT appendright and sort the right half. If larger than both
# arbitrarily append right. We only resort when we need to? No, it might go
# JUST left or right, and adding it shifts the window to include it. Bust.

# Ok. Heaps are out. They are great for inputting and mainting a min or
# max, but not an entire order. Its the same big o to sort them as if
# they werent mostly sorted in the first place. My brain next goes to
# "insertion sort? Thats a thing right? What was that thing". Beyond the
# name I barely remember. As a guess, its a DS that keeps a sorted list...
# Hey, thats what we want here. Off to research.

# https://www.geeksforgeeks.org/insertion-sort/
# Ok, inesertion sort is pretty basic. Just above bubble sort
# and the same principal, but as its sorting TO a sorted array
# it can stop once it finds the correct position, knowing its right
# without haveing to compare every element.

# Straioghtforward and easy to implement. Aaaaaand it doesnt work.
# TLE still. What the heck. Ok, we can improve this. We need to keep
# track of middle points anyway. Theres no reason to insert the new
# num at the FAR RIGHT and then insertionsort check all remaining vals
# Why not USE the known vals of the middle to speed up insertion?
# Is this a common technique? You could keep a couple of pointers
# and start insertion sort closer to its expected position. Three
# pointers would cut the list into quareters so youd only need to
# search 25% of the space to do an insertion, for the cost of a few
# extra ifs? Wait, I've just reinvented binary search.

# so the article is wrong? It says it uses insertion sort, then instead
# uses binary search to locate the intended potion, then shifts the right
# array over to insert the new num. Thats not insertion sort, thats just
# an insertion? I guess its *kind of* an insertion sort as its using the
# fact that the array is already sorted, but the way articles teach insertion
# sort is a linear scan through the unsorted portion...


# Follow up answers: If the numbers are strictly 1-100 we can just use a 2d
# list to store answers in buckets of their index. Or rather, the index could
# simply contain the number of times we've seen that value. We keep track of the
# number of elements pushed so far. When we call for median we just sequentially
# sum our buckets until we either exceed the median index mid-bucket, or need to
# split sum beteen two buckets for a pair of even numbered median indexes.

# For the follow up follow up, we do the same, but buckets [0] and [101] contain
# sorted lists of pushed values that exceed 1-100. We dont need to keep them sorted
# UNLESS the median is to be found within eitehr extreme bucket. At that point we will
# need to sort them and find the correct index within, summing two pairs if necessary.
# This may be ever so slightly complicated by a split median spanning [0]-[1] of [100]-[101]

# import heapq

# class MedianFinder:

#     def __init__(self):
#         self.num_list = []

#     def addNum(self, num: int) -> None:
#         #Idea 1: Heap? No good. O(nlogn) sorting
#         # heapq.heappush(self.heapy, num)

#         #Idea 2: Insertion sort? TLE still. Takes too long to search
#         #and shift entire space to add new nums once n is large
#         # self.num_list.append(num)
        j = len(self.num_list) - 2
        while j >= 0 and num < self.num_list[j] :
                self.num_list[j + 1] = self.num_list[j]
                j -= 1
        self.num_list[j + 1] = num

#         #idea 3: Binary search to find intended position, use concatentation
#         #of slices to reorder new list


#         #IDEA 4 AFTER THE FACT! TWO HEAPS!
#         #A min heap AND a max heap.
#         #The median would ALWAYS be the tops of the heap!
#         #If its larger than the top of the minHeap (right),
#         #heapPush to the right. If smaller than the top of
#         #the maxHeap (left) HeapPush Left. We can simply keep
#         #a seperate tracker, or sum the lengths of either to know
#         #if there are an odd or even number of elements. If even, its
#         #the mean of the top two heaps. If Odd we have two choices:
#         #Either push a copy of each new num to BOTH heaps (thus both
#         #tops would have the same number) and always just use the mean,
#         #making sure to pop off the duplicate when adding the NEXT number
#         #to the same heap. OR we could track seperately how many nums are
#         #in each pile and just grab the top of the larger pile. Oh wait...
#         #this requires the piles to always have the same size. We need to rebalance
#         #as we go.

#         #ok, lets just have an overflow "middle" variable, that holds the odd man out
#         #when we have an odd number of elements in our combined heaps. This way the
#         #heaps are always even numbered. IFF there is a middle value, and we are adding
#         #a new number to one heap, push the middle value to the other, emptying the middle
#         #value. If there ISNT a middle value, first pop the first element of the heap we are
#         #GOING to put the new value into, and store the popped value in the middle.


#         # local_inspect = self.num_list
#         l, r = 0, len(self.num_list)
#         while l < r:
#             m = (l + r) // 2
#             if self.num_list[m] < num:
#                 l = m + 1
#             else:
#                 r = m

#         #G'damn the time limit must be tight. Concatenating slices like this takes too long
#         #Where the insert fuction squeaks by. Interesting to note insert is faster
#         # self.num_list = self.num_list[:l] + [num] + self.num_list[l:]
#         self.num_list.insert(l, num)

#     def findMedian(self) -> float:
#         #As expected, simple converting to list doesnt sort. Is there a point in storing as a heap then?
#         # local_inspect = self.num_list
#         left = self.num_list[(len(self.num_list)-1)//2]/2
#         right = self.num_list[math.ceil((len(self.num_list)-1)/2)]/2
#         sumlr = left + right
#         return sumlr

# # Your MedianFinder object will be instantiated and called as such:
# # obj = MedianFinder()
# # obj.addNum(num)
# # param_2 = obj.findMedian()


######################### ATTEMPT 2: 2 Heaps #################################################
# HECK YEAH!
import heapq


class MedianFinder:
    def __init__(self):
        self.minHeap = []
        # NOTE: THERE IS NO MAXHEAP! This is just a minheap, but you negate all the values first.
        # AND LATER when pulling them back out. Simple, but obnoxious. Would love if theyd just
        # duplicate it, but I guess "tHaTS nOT pYthOnIC!"
        self.maxHeap = []
        self.middle = None

    def addNum(self, num: int) -> None:
        left, right, mid = self.maxHeap, self.minHeap, self.middle

        # initial two pushes
        if not self.maxHeap and not self.minHeap:
            # first push will be middle
            if self.middle == None:
                self.middle = num
            # seocnd push is going somewhere
            else:
                if num >= self.middle:
                    heapq.heappush(self.minHeap, num)
                    heapq.heappush(self.maxHeap, -self.middle)
                else:
                    heapq.heappush(self.maxHeap, -num)
                    heapq.heappush(self.minHeap, self.middle)
                self.middle = None
            return

        # SPECIAL CASE: If there is a middle, and the num num is the same as the middle, this
        # moves the middle into one pile, and the new num into the other. Doesnt matter
        # which goes where as they are the same number.
        # Otherwise, move the middle into the appropriate heap and the new num to the other
        if self.middle != None:
            if num >= self.middle:
                heapq.heappush(self.maxHeap, -self.middle)
                heapq.heappush(self.minHeap, num)
            else:
                heapq.heappush(self.minHeap, self.middle)
                heapq.heappush(self.maxHeap, -num)
            self.middle = None
            return

        # if there is no middle, and this new num should be it, do so
        if self.middle == None and num >= -self.maxHeap[0] and num <= self.minHeap[0]:
            self.middle = num

        # num is equal or larger than the min on the right, dump in right, popping a new middle
        elif self.minHeap and num >= self.minHeap[0]:
            self.middle = heapq.heapreplace(self.minHeap, num)

        # num is equal or smaller than the max on the left, dump in left, popping a new middle
        else:
            self.middle = -heapq.heapreplace(self.maxHeap, -num)

    def findMedian(self) -> float:
        return (
            self.middle
            if self.middle != None
            else (self.minHeap[0] - self.maxHeap[0]) / 2
        )