Morris

The Morris counter is one of the first streaming algorithms ever discovered. For a stream of \(n\) tokens, the Morris counter seeks to approximate the number of items in the stream in only \(O(\log n)\) bits of memory. Because the Morris counter acts as an unbiased estimator, we can approximate \(n\) up to value \(2^{256}\), using just one byte of memory for the counter. The algorithm works strictly on a vanilla model, meaning it can only record event insertions and does not allow for deletions.

class bouncer.counters.Morris(version, epsilon, delta)

Parameters

version : {“Morris”, “Morris+”, “Morris++”}, default=”Morris”

Specify the type of Morris counter.

  • "Morris": One 8-bit Morris counter.

  • “Morris+”: \(s = \lceil \frac{1}{\epsilon^2 \delta} \rceil\) “Morris” counters.

  • “Morris++”: \(t = \lceil \frac{18}{\log \delta} \rceil\) “Morris+” counters, where \(s =\lceil \frac{3}{ 2 \epsilon^2} \rceil\).

epsilon : float, range=(0,1)

Accepted error range of true count. Only implemented for ‘Morris+’, ‘Morris++’.

delta : float, range=(0,1)

Probability of exceeding epsilon error range. Only implemented for ‘Morris+’, ‘Morris++’.

Attributes

version : str

Type of Morris object.

epsilon : float

Epsilon of Morris object.

delta : float

Delta of Morris object.

raw_counts : list

List with count(s) of the Morris object.

shape : tuple (n_morris+, n_morris)

Shape of raw_counts. For “Morris++”, returns \((t,s)\), for “Morris+”, \((1,s)\), and for “Morris”, \((1,)\).

Methods

update()

Updates count(s) \(x\) w.p. \(\frac{1}{2^x}\).

output()
  • "Morris": Returns \(2^x - 1\).

  • “Morris+”: Returns the mean of \(s = \lceil \frac{1}{\epsilon^2 \delta} \rceil\) “Morris” outputs.

  • “Morris++”: Returns median of \(t = \lceil \frac{18}{\log \delta} \rceil\) “Morris+” outputs, where \(s =\lceil \frac{3}{ 2 \epsilon^2} \rceil\).

Examples

>> from bouncer.counters import Morris
>>
>> m = Morris(version="Morris")
>> for i in range(10):
...   m.update()
...
>>
>> m.output()
>> 15
>> m.shape
>> (1,)
>> m.raw_counts
>> [4]
>>
>> m1 = Morris(version="Morris+", epsilon = .2, delta = .5)
>> for i in range(10):
...    m1.update()
...
>>
>> m1.output()
>> 7.639999866485596
>> m1.shape
>> (1, 25)
>> m1.raw_counts
>> [2, 2, 3, 3, 3, 2, 3, 4, 3, 2, 3, 3, 2, 3, 3, 3, 3, 3, 2, 4, 3, 3, 4, 4, 4]
>>
>> m2 = Morris(version="Morris++", epsilon = .2, delta = .5)
>> for i in range(10):
...    m2.update()
...
>>
>> m2.output()
>> 10.894737243652344
>> m2.shape
>> (13, 38)

Note

This implementation requires 26 bytes of overhead per instantiation. This is due to the overhead of PyObject_HEAD and other python fields required for all python object allocations.

Analysis and proof of correctness of the bounds can be found in Jelani Nelson’s lecture notes under chapter 2.1.2: https://www.sketchingbigdata.org/fall20/lec/notes.pdf