Apache Pinot has built-in support for most major sketch families from Apache Datasketches as aggregation and transformation functions in its SQL dialect.
Example:
select distinctCountThetaSketch(
sketchCol,
'nominalEntries=1024',
'country'=''USA'' AND 'state'=''CA'', 'device'=''mobile'', 'SET_INTERSECT($1, $2)'
)
from table
where country = 'USA' or device = 'mobile...'
Supported functions have ‘raw’ variants which can output binary representations of sketches for further processing.
Example:
select percentileRawKll(ArrDelayMinutes, 90) as sketch
from airlineStats
Returns Base64 encoded string: BQEPC...
Output can be processed as:
byte[] decodedBytes = Base64.getDecoder().decode(encoded);
KllDoublesSketch sketch = KllDoublesSketch.wrap(Memory.wrap(decodedBytes));
System.out.println("Min, Median, Max values:");
System.out.println(Arrays.toString(sketch.getQuantiles(new double[]{0, 0.5, 1})));
Apache Pinot can also ingest pre-built sketch objects either via Kafka (Realtime) or Spark (Batch) and merge them when doing aggregations.