Mathematical models of performance, reliability and operating costs of a system for storing deduplicated data on SSD disks
Abstract
Mathematical models of performance, reliability and operating costs of a system for storing deduplicated data on SSD disks
Incoming article date: 10.08.2019The article is devoted to the development of a complex of mathematical models describing the functioning of a data storage system based on solid-state drives using deduplication technology. The mathematical model of the user application generates a load on the system as a stream of requests, randomly sized according to the Pareto law, and with random time interval between requests. Requests are received at the input of the storage system into the network service model, then to the VDO deduplication system model, then to the software RAID model and, finally, to the solid-state drive model for read/write operation. Due to the nature of SSDs, system performance in read and write modes is modeled separately, taking into account the different speed characteristics of RAID-5, RAID-6 and RAID-10 arrays. The mathematical model of the reliability of each RAID array is based on the Kolmogorov-Chapman system of equations for calculating stationary probabilities describing transitions between states in a discrete Markov chain. The durability of the system is determined through the model for assessing the exhaustion of the recording resource of solid-state drives. The mathematical model for estimating the storage cost includes the costs of equipment, resources and maintenance over the entire operation period of the system. The final result is a mathematical formulation of the problem of data storage system optimal design, which allows selecting the system architecture and parameters that are optimal in terms of a combination of factors – reliability, speed and cost of data storage.
Keywords: mathematical modeling, data storage system, performance, reliability, solid state drives, RAID array, optimization