Morris -------- The Morris counter is one of the first streaming algorithms ever discovered. For a stream of :math:`n` tokens, the Morris counter seeks to approximate the number of items in the stream in only :math:`O(\log n)` bits of memory. Because the Morris counter acts as an unbiased estimator, we can approximate :math:`n` up to value :math:`2^{256}`, using just one byte of memory for the counter. The algorithm works strictly on a vanilla model, meaning it can only record event insertions and does not allow for deletions. .. class:: bouncer.counters.Morris(version, epsilon, delta) **Parameters** **version : {"Morris", "Morris+", "Morris++"}, default="Morris"** Specify the type of Morris counter. * ``"Morris"``: One 8-bit Morris counter. * "Morris+": :math:`s = \lceil \frac{1}{\epsilon^2 \delta} \rceil` "Morris" counters. * "Morris++": :math:`t = \lceil \frac{18}{\log \delta} \rceil` "Morris+" counters, where :math:`s =\lceil \frac{3}{ 2 \epsilon^2} \rceil`. **epsilon : float, range=(0,1)** Accepted error range of true count. Only implemented for 'Morris+', 'Morris++'. **delta : float, range=(0,1)** Probability of exceeding epsilon error range. Only implemented for 'Morris+', 'Morris++'. **Attributes** **version : str** Type of Morris object. **epsilon : float** Epsilon of Morris object. **delta : float** Delta of Morris object. **raw_counts : list** List with count(s) of the Morris object. **shape : tuple (n_morris+, n_morris)** Shape of raw_counts. For "Morris++", returns :math:`(t,s)`, for "Morris+", :math:`(1,s)`, and for "Morris", :math:`(1,)`. **Methods** .. function:: update() Updates count(s) :math:`x` w.p. :math:`\frac{1}{2^x}`. .. function:: output() * ``"Morris"``: Returns :math:`2^x - 1`. * "Morris+": Returns the mean of :math:`s = \lceil \frac{1}{\epsilon^2 \delta} \rceil` "Morris" outputs. * "Morris++": Returns median of :math:`t = \lceil \frac{18}{\log \delta} \rceil` "Morris+" outputs, where :math:`s =\lceil \frac{3}{ 2 \epsilon^2} \rceil`. **Examples** :: >> from bouncer.counters import Morris >> >> m = Morris(version="Morris") >> for i in range(10): ... m.update() ... >> >> m.output() >> 15 >> m.shape >> (1,) >> m.raw_counts >> [4] >> >> m1 = Morris(version="Morris+", epsilon = .2, delta = .5) >> for i in range(10): ... m1.update() ... >> >> m1.output() >> 7.639999866485596 >> m1.shape >> (1, 25) >> m1.raw_counts >> [2, 2, 3, 3, 3, 2, 3, 4, 3, 2, 3, 3, 2, 3, 3, 3, 3, 3, 2, 4, 3, 3, 4, 4, 4] >> >> m2 = Morris(version="Morris++", epsilon = .2, delta = .5) >> for i in range(10): ... m2.update() ... >> >> m2.output() >> 10.894737243652344 >> m2.shape >> (13, 38) .. note:: This implementation requires 26 bytes of overhead per instantiation. This is due to the overhead of PyObject_HEAD and other python fields required for all python object allocations. Analysis and proof of correctness of the bounds can be found in Jelani Nelson's lecture notes under chapter 2.1.2: https://www.sketchingbigdata.org/fall20/lec/notes.pdf