Data centers play an essential role in the functioningof modern society. However, failures are unavoidable in datacenter networks (DCN) and will lead to negative impact on allapplications. Therefore, researchers are interested in the rapiddetection an...
Data centers play an essential role in the functioningof modern society. However, failures are unavoidable in datacenter networks (DCN) and will lead to negative impact on allapplications. Therefore, researchers are interested in the rapiddetection and localization of failures in DCNs.
In this paper, we present a theoretical model to analyze theend-to-end failure detection methods in data center networks.
Our numerical results verify that the proposed theoretical modelis accurate. In addition, we propose an algorithm to constructprobing matrices based on an enhanced probing path selectionindicator. We also introduce deep reinforcement learning (DRL)method to solve the problem and propose a DRL-based probingmatrix construction algorithm. Our experimental results showthat both of the proposed algorithms for constructing probingmatrices achieve better performance in detection accuracy thanexisting methods. We discussed different scenarios that thealgorithms are applicable to that can improve detection accuracyor construction speed performance.