Morris
The Morris counter is one of the first streaming algorithms ever discovered. For a stream of \(n\) tokens, the Morris counter seeks to approximate the number of items in the stream in only \(O(\log n)\) bits of memory. Because the Morris counter acts as an unbiased estimator, we can approximate \(n\) up to value \(2^{256}\), using just one byte of memory for the counter. The algorithm works strictly on a vanilla model, meaning it can only record event insertions and does not allow for deletions.
- class bouncer.counters.Morris(version, epsilon, delta)
Parameters
- version : {“Morris”, “Morris+”, “Morris++”}, default=”Morris”
Specify the type of Morris counter.
"Morris"
: One 8-bit Morris counter.“Morris+”: \(s = \lceil \frac{1}{\epsilon^2 \delta} \rceil\) “Morris” counters.
“Morris++”: \(t = \lceil \frac{18}{\log \delta} \rceil\) “Morris+” counters, where \(s =\lceil \frac{3}{ 2 \epsilon^2} \rceil\).
- epsilon : float, range=(0,1)
Accepted error range of true count. Only implemented for ‘Morris+’, ‘Morris++’.
- delta : float, range=(0,1)
Probability of exceeding epsilon error range. Only implemented for ‘Morris+’, ‘Morris++’.
Attributes
- version : str
Type of Morris object.
- epsilon : float
Epsilon of Morris object.
- delta : float
Delta of Morris object.
- raw_counts : list
List with count(s) of the Morris object.
- shape : tuple (n_morris+, n_morris)
Shape of raw_counts. For “Morris++”, returns \((t,s)\), for “Morris+”, \((1,s)\), and for “Morris”, \((1,)\).
Methods
- update()
Updates count(s) \(x\) w.p. \(\frac{1}{2^x}\).
- output()
"Morris"
: Returns \(2^x - 1\).“Morris+”: Returns the mean of \(s = \lceil \frac{1}{\epsilon^2 \delta} \rceil\) “Morris” outputs.
“Morris++”: Returns median of \(t = \lceil \frac{18}{\log \delta} \rceil\) “Morris+” outputs, where \(s =\lceil \frac{3}{ 2 \epsilon^2} \rceil\).
Examples
>> from bouncer.counters import Morris >> >> m = Morris(version="Morris") >> for i in range(10): ... m.update() ... >> >> m.output() >> 15 >> m.shape >> (1,) >> m.raw_counts >> [4] >> >> m1 = Morris(version="Morris+", epsilon = .2, delta = .5) >> for i in range(10): ... m1.update() ... >> >> m1.output() >> 7.639999866485596 >> m1.shape >> (1, 25) >> m1.raw_counts >> [2, 2, 3, 3, 3, 2, 3, 4, 3, 2, 3, 3, 2, 3, 3, 3, 3, 3, 2, 4, 3, 3, 4, 4, 4] >> >> m2 = Morris(version="Morris++", epsilon = .2, delta = .5) >> for i in range(10): ... m2.update() ... >> >> m2.output() >> 10.894737243652344 >> m2.shape >> (13, 38)
Note
This implementation requires 26 bytes of overhead per instantiation. This is due to the overhead of PyObject_HEAD and other python fields required for all python object allocations.
Analysis and proof of correctness of the bounds can be found in Jelani Nelson’s lecture notes under chapter 2.1.2: https://www.sketchingbigdata.org/fall20/lec/notes.pdf