API Latest Releases: Java Core, C++ Core, Python, Memory, Pig, Hive,

Selected References on (Special Cases of) Theta Sketches

  • Z. Bar-Yossef, T. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In RANDOM, pages 1–10, 2002.
  • E. Cohen. All-distances sketches, revisited: HIP estimators for massive graphs analysis. In PODS, pages 88–99, 2014.
  • E. Cohen and H. Kaplan. Leveraging discarded samples for tighter estimation of multiple-set aggregates. In SIGMETRICS, pages 251–262, 2009.
  • P. Flajolet. On adaptive sampling. Computing, 43(4):391–400, 1990.
  • P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. DMTCS Proceedings, 0(1), 2008.
  • P. B. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In SPAA, pages 281–291, 2001.
  • S. Heule, M. Nunkesser, and A. Hall. Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm. In EDBT, pages 683–692, 2013.
  • D. Ting. Streamed approximate counting of distinct elements: Beating optimal batch methods. In KDD, 2014.
  • A. Dasgupta, K. J. Lang, L. Rhodes, J. Thaler. A Framework for Estimating Stream Expression Cardinalities. In ICDT, 2016. Invited to ACM Transactions on Database Systems (Special Issue for ICDT 2016). Best Newcomer Award.