Kafka, a storage-based message broker, is used as a data hub in many services, and administrators are controlling the number of Kafka brokers according to their traffic and capacity.
In this process, partition(logical unit where messages are stored) m...
Kafka, a storage-based message broker, is used as a data hub in many services, and administrators are controlling the number of Kafka brokers according to their traffic and capacity.
In this process, partition(logical unit where messages are stored) migration in Kafka is a costly operation because it is possible after all the partition data has been replicated to the new broker.
This research focuses on the fact that the role of the storage-based message broker may be only a cache and storage interface, and that SDS (Software Defined Storage) such as Ceph is widely used.
Kafka stores messages in a block device through a VFS (Virtual File System) to store messages. Separately, this research directly stores the data in the SDS, and the broker provides the interface and cache system.
To do this, the system requires to store messages in SDS, broker coordination, and partition management within the broker.
In this research, we propose a message storage method that can access messages with constant time using fixed size index objects.
It also discusses how to provide self-coordination between brokers via SDS without using a separate coordination solution, as well as managing partitions and optimizing IO.
Using this research, it would be possible to provide a message broker as a service capable of fast migration to tenants in a environment that has large scale SDS resources(eg, cloud providers, provides some of the SDS resources to tenants).
In addition, because of the fast migration, computing resources can be saved by performing migration to other broker and shutting down idle brokers when traffic is low. This means you can lower TCO(Total Cost of Ownership).
In addition, when using Kafka, brokers are executed in a virtual machine and SDS is used as a block device. In this case, broker-level replication is performed at the broker application level to store messages, Replication is performed again, so there is a problem that the same message replica is generated more than necessary. This research shows that message replication is dependent on SDS's own replication, so it can reduce the network traffic of broker application and storage capacity of SDS (broker level Replica 3 copy: 66% reduction).
In addition, this research shows that the performance is better than the Kafka by 20 ~ 30%. If we can develop this research further and use it at the product level, we will convert the project into open source and more evolving it.