docs/source/Morris.rst · Bouncer

Morris
--------

The Morris counter is one of the first streaming algorithms ever discovered.
For a stream of :math:`n` tokens, the Morris counter seeks to approximate the number of items in the stream 
in only :math:`O(\log n)` bits of memory. Because the Morris counter acts as an unbiased estimator, we can approximate :math:`n` up to value
:math:`2^{256}`, using just one byte of memory for the counter. 
The algorithm works strictly on a vanilla model, meaning it can only record event insertions and does not allow for deletions.

.. class:: bouncer.counters.Morris(version, epsilon, delta)

    **Parameters**

        **version : {"Morris", "Morris+", "Morris++"}, default="Morris"**
            Specify the type of Morris counter.   

            * ``"Morris"``: One 8-bit Morris counter.

            * "Morris+": :math:`s = \lceil \frac{1}{\epsilon^2 \delta} \rceil` "Morris" counters. 

            * "Morris++": :math:`t = \lceil \frac{18}{\log \delta} \rceil` "Morris+" counters, where :math:`s =\lceil \frac{3}{ 2 \epsilon^2} \rceil`.
            
        **epsilon : float, range=(0,1)**
            Accepted error range of true count. Only implemented for 'Morris+', 'Morris++'. 

        **delta : float, range=(0,1)**
            Probability of exceeding epsilon error range. Only implemented for 'Morris+', 'Morris++'.

    **Attributes**

        **version : str**
            Type of Morris object. 

        **epsilon : float**
           Epsilon of Morris object. 

        **delta : float**
            Delta of Morris object. 

        **raw_counts : list**
            List with count(s) of the Morris object. 

        **shape : tuple (n_morris+, n_morris)**
            Shape of raw_counts. For "Morris++", returns :math:`(t,s)`, for "Morris+",  :math:`(1,s)`, and 
            for "Morris", :math:`(1,)`.

    **Methods**

        .. function:: update()

            Updates count(s) :math:`x` w.p. :math:`\frac{1}{2^x}`.
        
        .. function:: output()
            
            * ``"Morris"``:  Returns :math:`2^x - 1`.

            * "Morris+": Returns the mean of :math:`s = \lceil \frac{1}{\epsilon^2 \delta} \rceil` "Morris" outputs. 

            * "Morris++": Returns median of :math:`t = \lceil \frac{18}{\log \delta} \rceil` "Morris+" outputs, where :math:`s =\lceil \frac{3}{ 2 \epsilon^2} \rceil`.

    **Examples** ::

        >> from bouncer.counters import Morris
        >> 
        >> m = Morris(version="Morris")
        >> for i in range(10):
        ...   m.update()
        ...
        >>
        >> m.output()
        >> 15
        >> m.shape
        >> (1,)
        >> m.raw_counts
        >> [4]
        >>
        >> m1 = Morris(version="Morris+", epsilon = .2, delta = .5)
        >> for i in range(10):
        ...    m1.update()
        ...
        >>
        >> m1.output()
        >> 7.639999866485596
        >> m1.shape
        >> (1, 25)
        >> m1.raw_counts
        >> [2, 2, 3, 3, 3, 2, 3, 4, 3, 2, 3, 3, 2, 3, 3, 3, 3, 3, 2, 4, 3, 3, 4, 4, 4]
        >>
        >> m2 = Morris(version="Morris++", epsilon = .2, delta = .5)
        >> for i in range(10):
        ...    m2.update()
        ...
        >>
        >> m2.output()
        >> 10.894737243652344
        >> m2.shape
        >> (13, 38)
        
.. note::
    This implementation requires 26 bytes of overhead per instantiation. 
    This is due to the overhead of PyObject_HEAD and other python fields required for all python object allocations.

Analysis and proof of correctness of the bounds can be found in Jelani Nelson's lecture 
notes under chapter 2.1.2: https://www.sketchingbigdata.org/fall20/lec/notes.pdf