Fernando DoglioMay 9, 2022

Using Memurai as message bus, edge aggregator, low latency response

Technical articles and news about Memurai.

The Internet of Things (IoT) is a trend that has an infinite potential of connecting every single aspect of our lives together, bringing about very interesting analytical use cases. However, the "technological price" we have to pay to achieve it is something that not everyone is capable of meeting. The number of devices sending data for us to take advantage of is constantly growing. And so is the amount of data they generate. The miniaturization of sensors is allowing new devices to generate more data points than ever in every second. However, that is causing a significant challenge on the receiving end: how do we capture and analyze all that incoming data with the currently available technology fast enough to provide a seamless experience? To answer this question, we first need to understand that IoT is divided into three major realms: Edge, Fog, and Cloud.

You can think of these concepts as layers; the closer they are to the devices, the more unprocessed and schema-less data they have to deal with. And the closer they are to the cloud, the more uniform and aggregated everything becomes. The point of the edge layer is to be as close to the device as possible, connecting directly with it and exchanging data. By contrast, the fog layer is meant to be between the edge and the cloud, filtering the data that is sent "up" and providing some early processing to alleviate the potential latency problems derived from sending data back and forth to the cloud.

Dealing with millions of inputs in real time

The main technical challenge in any IoT project, whether you are building it for a car factory or a smart city or tracking the location of livestock with GPS chips, is the amount of data generated. Small, cheap devices are very easy to install everywhere, but the amount of data they generate is hard to process for two main reasons:

1- Latency. For an IoT platform to be useful, the insights it generates, the response, or even the reporting around it needs to happen in near real time. This means low latency is crucial, but how can you process potentially millions of signals fast enough?
2- Schema. Every device generates its data packages following a nonstandard schema. This is normal because they can all be created by different providers, and even if they were not, these sensors are all focused on measuring very different aspects of their environment. A temperature sensor sending data every 20 seconds cannot be expected to share a schema with a camera sending pictures every 10 seconds or with a GPS chip sending telemetry every 500 milliseconds.

Together they create a complex problem to solve. The ideal product should have the ability to capture data and store it quickly while being able to accommodate any schema received. The second point leaves out relational databases because they rely on fixed schemas. The first requirement points toward an in-memory database considering how it currently has the lowest latency numbers both when reading and writing data into it.

With that said, directly writing into such a database would still be a problem, considering that the number of devices would grow faster than any storage solution can dynamically scale. So a "buffer" needs to be put in place at the edge layer to allow for data to be processed and stored as fast as technically possible without fear of data loss.

Thus, we can attempt to solve this problem with an in-memory database and a message queue combo.

Capturing real-time IoT data with Redis Streams

Capturing sensor data with Redis meets all the points mentioned before. It is a fast and reliable in-memory database capable of blazingly fast data writes and reads without requiring a schema on the data received. Not only that, but we can also check the message queue box with Redis Streams, the ideal solution to implement the buffer we require.

You can think about IoT data as a stream of time-based data points sent constantly from different devices and sensors. And Redis Streams are lists of time-based indexed entries. By default, each stream generates a unique ID for each data point that is time-indexed to the millisecond level, and on top of that, this ID gets an additional sequential number in case there are collisions (multiple devices sending data at the exact same time). Not only that, but the schema-less support from Redis is present here and optimized even further. You see, schema-less databases, such as MongoDB, require you to save your information with the actual schema at the same time. For instance, if you had a GPS chip sending data, you would have to save it like this:

{
    "timestamp": 1639371637,
    "lat": "25,00,0.00N",
    "long": "-71,00,0.00W"
}

Now, the extra data used to store the "timestamp," "lat," and "long" strings might not seem like much here, but repeated over millions of records, they start to add up and memory you could have used to store more readings gets used up to store the schema. Redis, however, is capable of optimizing this structure internally if you keep the field names somewhat consistent. Of course, some variation between devices will break the pattern, but as long as you are able to distribute different schemas into different streams, Redis will be able to optimize them and save a considerable amount of memory by not repeating unnecessary data.

Storing this data into a stream only requires you to perform a simple XADD command as such:

XADD gpscoords * timestamp 1639371637 lat "25,00,0.00N" long  "-71,00,0.00W"

Redis Streams as edge aggregators

Thus, through the use of Streams, you are able to aggregate the data from multiple devices into a single, coherent pipeline that can then process, clean, and store the data until it is required to be used. This is, of course, an immediate need because we are dealing with near real-time latency; however, we can take the time in the edge layer to quickly preprocess the data and perform some quick actions on it before sending it on to the next layer. Actions such as adding metadata, aggregating several data points before moving forward, or even performing some data validation and dropping corrupted reads are some of the most common use cases.

Reactively reading data from Streams

Another benefit of using Streams is that your processes can then reactively listen for new events to arrive on the streams. This increases performance and optimizes resource utilization by only executing queries against the streams when there is new data to process. Not only that, but a single process can potentially listen for updates on multiple streams, thus simplifying your process ecosystem and orchestration. Instead of having a single process dealing with a single stream, you can have related data streams be processed by the same code.

XREAD COUTN 2 STREAMS gpscoords locationtemperaturereadings 0-0 0-0

The above command will listen for new records to be stored inside the "gpscoords" stream and in the "locationtemperaturereadings" that could house the readings of a temperature sensor in your device. If both the GPS chip and the sensor are sending data at similar time intervals, you could simply match both readings through their IDs inside both streams (knowing the precision of each sensor beforehand of course). Once you have the information at the edge level, you can do whatever you need with it before sending it over into the fog layer. For instance, imagine having your sensor sending readouts every second while your GPS chip is sending signals every 10 seconds. You could use this data to aggregate temperature reads for different locations as such:

{
    "timestamp": 1639371637,
    "lat": "25,00,0.00N",
    "long": "-71,00,0.00W",
    "temp_reads": [
        30.4,
        30.5,
        30.2,
        .... 
    ]
}

thus using Redis (and Redis Streams) as an edge aggregator for your IoT platform.

Implementing your edge layer on Windows

Memurai, being a Windows alternative to the standard Redis, provides every feature we have discussed here without having to worry about any of them individually. With a very straightforward installation on your systems, you will have access to the exact same blazingly fast in-memory database and the Streams (among multiple other features).

And on top of that, you will have a very developer-friendly product that provides a host of powerful features to play with.

Memurai enables you to build your solution, and because it is fully compatible with the Redis API, any library on any supported programming language will be compatible.