Redis bloom filters in batch processing
|Redis is in-memory data store primarily used as key value cache. Apart from simple string key values, redis has support for many other data structures. Some of the core data types supported by redis are listed below.
Along with these core types, additional data structures are supported using modules. Bloom filters are supported by ReidsBloom module. If we observe the design of the redis, nearly all the data structures support multi operations. Bloom filter is no exception to this.These multi operations are essential for efficient batch processing, as they will reduce number of calls made to redis by the application. Its relatively easy to use these batch operators in new designs but implementing them in existing code base could be tricky. if the existing code base designed to process each element in the batch one by one, then lot of refactoring might be required to use the batch operators.
Consider a simple batch processing pipeline, generally it will perform set of operations on set of elements (a batch). let’s assume that batches may contain elements that are already processed and reprocessing them it not required (duplicates). We can use Bloom filter to quickly identify duplicates in a given batch (if filter say there are duplicates, we need to verify these duplicates with some kind persistence store as bloom filter can provide false positives due its probabilistic nature) and drop them from the pipeline. Once the processing is completed, we need to add the processed elements to the Bloom filter.
BF.MADD – adds one or more entries to the bloom filter
BF.MADD filter-name item1 [item2 ...]
BF.MEXISTS – verifies if the given entries are present in the bloom filter
BF.MEXISTS filter-name item1 [item2 ...]
Checkout this page for all available bloom filter commands.
Following java code snippets demonstrates how to use these commands with Jedis driver (for bloom filters support you need use least 4.2.0 version)
If you want to use spring redis template, you can use the following code. Spring redis template doesn’t have native bloom filter support, so we have to run the command on raw connection.