Three derivative data structures of distributed Redis

3.derivative data structures of distributed Redis


Speaking of redisdata structures, you may be familiar with the five basic data types: String, Hash, List, Set, Sorted Set. Then addition, there are three major derived data structure, we usually are little contact, namely: bitmaps, hyperloglog, geo Also, I think, these three data structures, can only be icing on the cake. Really in the project, I really haven't used it. Let's take a look at the definition and use of these three data structures



Speaking of this bitmaps, it is actually String, but it can operate on Stringthe bits. And then, this bit operations, has its own command, and so the operation Stringof the rediscommand not much different! Can be understood like this

Bitmaps is an array in units of bits, each unit of the array can only store 0 and 1

Here is an example, for example, if we want to do a set operation, keyfor w, valuefor h, then execute the following command> set w h
OK> get w

Then hthe ASCII is 0110 1000 Next, you can use the bit command getbitcommand to fetch, fetch the content of each bit.> getbit w 0 # getbit w 0 
(integer) 0> getbit w 1 # getbit w 1 
(integer) 1> getbit w 2 # getbit w 2 
(integer) 1> getbit w 3 # getbit w 3 
(integer) 0


According to online rumors, this structure is used to count the number of active users in a certain period of time, and the structure used bitmapis less setspace- saving than the traditional structure. However, this use has great limitations, as I will talk about later. Let me talk about the online statement. Suppose there are 30 users, of which 5 users 2018-10-04log in on this day. Suppose the userid of these 5 users is 2, 4, 8, 11, 12. Then, we assume that the key is users:2018-10-04, and its valuevalue is used to record user login information. So in order to record that the above 5 users have logged in, we valueset the 2nd, 4th, 8th, 11th, and 12th bits of the value to 1, that is, execute the following command> setbit users:2018-10-04 2 1
(integer) 0> setbit users:2018-10-04 4 1
(integer) 0> setbit users:2018-10-04 8 1
(integer) 0> setbit users:2018-10-04 11 1
(integer) 0> setbit users:2018-10-04 12 1
(integer) 0

At this time, for example, if you want to determine the user with userid=11, on 2018-10-04, if you have logged in, execute the following command> getbit users:2018-10-04 11
(integer) 1

The result is 1, which means the user has logged in. If the returned result is 0, it means that the user has never logged in. If you want to count, 2018-10-04, the number of users logged in on this day, you can execute the following command> bitcount users:2018-10-04
(integer) 5

The above command can be counted, 2018-10-04, 5 users logged in on this day. Ok, you won't be able to understand much if you find it here. Let me talk about it first, here userid=2 4 8 11 12, it can be understood as an offset. For example, if the userid in the actual project is 1000002, then the offset is 2. Everyone can be flexible in the project. However, this approach has a limitation. In our actual project, if the userid is generated using uuid, then how do you generate the offset based on these userid? Could it be that you have to find a hash function to generate an offset? Even if the corresponding hash function is found, can you ensure that the hash collision will not occur, resulting in the same offset? So, everyone can understand.



HyperLogLogIt is not a data structure, but an algorithm that can use a very small memory space to complete the statistics of independent totals . In fact, everyone may be relatively unfamiliar with this algorithm. We javahave a library called stream-lib, which also implements HyperLogLogalgorithms. I probably talk about the principle of the algorithm . I don't want to go to a long-formed math paper. Everyone looks boring. This Hyperrefers to the super meaning. Its previous life is the LogLog algorithm. Here, I m just a little bit stunned, so that everyone can understand the essence. Suppose there is the following dialogue

Me: "Small song, suppose, I tossed 5 coins in a round, and after many rounds, I found that in these rounds, there were at most 2 consecutive negatives and 1 positive. Can you guess how many rounds I lost? "Small tune: "It shouldn't be a few rounds, at most seven or eight rounds." Me: "Fuck, so witty, how do you count?" Small tune: "It's very simple, the pros and cons are both 1/2, even Focusing on the second negative side and the first positive side. Isn t it 1/2 1/2 1/2!" I: "What if there are 4 negative sides and 1 positive side at most?" Xiao tune: "It should be many rounds, right? !" Me: "It's really witty!"

The above chat is between myself and my colleague, and they play each other every day! Any similarity is purely coincidental!

Well, the principle is over! It's just that his estimation algorithm is more complicated! It's not that simple! And with this estimate, the error is still relatively large! The pseudo code of the algorithm is given below.

     max = 0
               hashCode = hash( )
               num = hashCode 0 
               if num > max:
                   max = num
      2 (max + 1) 

It should be noted

hashCode = hash( )

It is to map your input element into a long binary string. After mapping it into a long binary string, it can be compared to the result of the coin toss I mentioned first. As for why the final result is used (max+1), you can check the literature. After all, this article is talking about redis, not about this algorithm. Moreover, this algorithm has undergone a series of evolutions. For example, the input parameter set is divided into m parts, and then mthe results of each part are averaged (avg), and finally the (avg + 1)independent total is estimated by the power of 2 ! Those readers who are interested can make inquiries by themselves!


This structure can save memory to count various counts, such as the IPnumber of registrations and the number of daily visits IP. Of course, there are errors! The official number given by Redis is 0.81% error rate. The usage is also very simple as shown below> pfadd ips:2018-10-04 "" "" "" "" 
(integer) 1> pfcount ips:2018-10-04
(integer) 4

The above is the demonstration. On 2018-10-04, about 4 ips logged into the system! There is a comparison chart of the occupied space with the traditional collection structure on the Internet, post it for everyone to see

Attention, once again, there are errors in using this structure! For example, if you pfaddenter a million pieces of data, pfcountthe result may be 999,756!



Geo can be used to store latitude and longitude, calculate the distance between two places, range calculations, etc. The underlying implementation is zset.


There are mainly the following six groups of commands

  • geoadd: Add the coordinates of a certain geographic location.
  • geopos: Get the coordinates of a certain geographic location.
  • geodist: Get the distance between two geographic locations.
  • georadius: Obtain a set of geographic locations within a specified range according to the given geographic location coordinates.
  • georadiusbymember: Get a collection of geographic locations within a specified range according to a given geographic location
  • geohash: Get the geohash value of a certain geographic location.

I ll post an example of the official website document directly here. If you are interested, you can check it yourself. 1. add two coordinates to the key.

redis> GEOADD Sicily 13.361389 38.115556 "Palermo" 15.087269 37.502669 "Catania"
(integer) 2

2. calculate an example between two coordinates

redis> GEODIST Sicily Palermo Catania

Finally, calculate the distance from the latitude and longitude (15,37)distance 100kmand 200kmthe coordinates within the range

redis> GEORADIUS Sicily 15 37 100 km
1) "Catania"

redis> GEORADIUS Sicily 15 37 200 km
1) "Palermo"
2) "Catania"