The article examines the transition of universities from data warehouses to data lakes, revealing their potential in processing big data. The introduction highlights the main differences between storage and lakes, focusing on the difference in the philosophy of data management. Data warehouses are often used for structured data with relational architecture, while data lakes store data in its raw form, supporting flexibility and scalability. The section ""Data Sources used by the University"" describes how universities manage data collected from various departments, including ERP systems and cloud databases. The discussion of data lakes and data warehouses highlights their key differences in data processing and management methods, advantages and disadvantages. The article examines in detail the problems and challenges of the transition to data lakes, including security, scale and implementation costs. Architectural models of data lakes such as ""Raw Data Lake"" and ""Data Lakehouse"" are presented, describing various approaches to managing the data lifecycle and business goals. Big data processing methods in lakes cover the use of the Apache Hadoop platform and current storage formats. Processing technologies are described, including the use of Apache Spark and machine learning tools. Practical examples of data processing and the application of machine learning with the coordination of work through Spark are proposed. In conclusion, the relevance of the transition to data lakes for universities is emphasized, security and management challenges are emphasized, and the use of cloud technologies is recommended to reduce costs and increase productivity in data management. The article examines the transition of universities from data warehouses to data lakes, revealing their potential in processing big data. The introduction highlights the main differences between storage and lakes, focusing on the difference in the philosophy of data management. Data warehouses are often used for structured data with relational architecture, while data lakes store data in its raw form, supporting flexibility and scalability. The section ""Data Sources used by the University"" describes how universities manage data collected from various departments, including ERP systems and cloud databases. The discussion of data lakes and data warehouses highlights their key differences in data processing and management methods, advantages and disadvantages. The article examines in detail the problems and challenges of the transition to data lakes, including security, scale and implementation costs. Architectural models of data lakes such as ""Raw Data Lake"" and ""Data Lakehouse"" are presented, describing various approaches to managing the data lifecycle and business goals. Big data processing methods in lakes cover the use of the Apache Hadoop platform and current storage formats. Processing technologies are described, including the use of Apache Spark and machine learning tools. Practical examples of data processing and the application of machine learning with the coordination of work through Spark are proposed. In conclusion, the relevance of the transition to data lakes for universities is emphasized, security and management challenges are emphasized, and the use of cloud technologies is recommended to reduce costs and increase productivity in data management.
Keywords: data warehouse, data lake, big data, cloud storage, unstructured data, semi-structured data
The paper presents a method for quantitative assessment of zigzag trajectories of vehicles, which allows to identify potentially dangerous behavior of drivers. The algorithm analyzes changes in direction between trajectory segments and includes data preprocessing steps: merging of closely spaced points and trajectory simplification using a modified Ramer-Douglas-Pecker algorithm. Experiments on a balanced data set (20 trajectories) confirmed the effectiveness of the method: accuracy - 0.8, completeness - 1.0, F1-measure - 0.833. The developed approach can be applied in traffic monitoring, accident prevention and hazardous driving detection systems. Further research is aimed at improving the accuracy and adapting the method to real-world conditions.
Keywords: trajectory, trajectory analysis, zigzag, trajectory simplification, Ramer-Douglas-Pecker algorithm, yolo, object detection
In this paper, a new model of an open multichannel queuing system with mutual assistance between channels and limited waiting time for a request in a queue is proposed. General mathematical dependencies for the probabilistic characteristics of such a system are presented.
Keywords: queuing system, queue, service device, mutual assistance between channels
Currently, key aspects of software development include the security and efficiency of the applications being created. Special attention is given to data security and operations involving databases. This article discusses methods and techniques for developing secure applications through the integration of the Rust programming language and the PostgreSQL database management system (DBMS). Rust is a general-purpose programming language that prioritizes safety as its primary objective. The article examines key concepts of Rust, such as strict typing, the RAII (Resource Acquisition Is Initialization) programming idiom, macro definitions, and immutability, and how these features contribute to the development of reliable and high-performance applications when interfacing with databases. The integration with PostgreSQL, which has been demonstrated to be both straightforward and robust, is analyzed, highlighting its capacity for efficient data management while maintaining a high level of security, thereby mitigating common errors and vulnerabilities. Rust is currently used less than popular languages like JavaScript, Python, and Java, despite its steep learning curve. However, major companies see its potential. Rust modules are being integrated into operating system kernels (Linux, Windows, Android), Mozilla is developing features for Firefox's Gecko engine and StackOverflow surveys show a rising usage of Rust. A practical example involving the dispatch of information related to class schedules and video content illustrates the advantages of utilizing Rust in conjunction with PostgreSQL to create a scheduling management system, ensuring data integrity and security.
Keywords: Rust programming language, memory safety, RAII, metaprogramming, DBMS, PostgreSQL
The railway transport industry demonstrates significant achievements in various fields of activity through the introduction of predictive analytics. Predictive analytics systems use data from a variety of sources, such as sensor networks, historical data, weather conditions, etc. The article discusses the key areas of application of predictive analytics in railway transport, as well as the advantages, challenges and prospects for further development of this technology in the railway infrastructure.
Keywords: predictive analytics in railway transport, passenger traffic forecasting, freight optimization, maintenance optimization, inventory and supply management, personnel management, financial planning, big data analysis
A Simulink model is considered that allows calculating transient processes of objects described using a transient function for any type of input action. An algorithm for the operation of the S-function that performs calculations using the Duhamel integral is described. It is shown that due to the features of the S-function, it can store the values of the previous step of the Simulink model calculation. This allows the input signal to be decomposed into step components and the time of occurrence of each step and its value to be stored. For each step of the input signal increment, the S-function calculates the response by scaling the transient response. Then, at each step of the calculation, the sum of such reactions is found. The S-function provides a procedure for freeing memory when the end point of the transient response is reached at each step. Thus, the amount of memory required for the calculation does not increase above a certain limit, and, in general, does not depend on the length of the model time. For calculations, the S-function uses matrix operations and does not use cycles. Due to this, the speed of model calculation is quite high. The article presents the results of calculations. Recommendations are given for setting the parameters of the model. A conclusion is formulated on the possibility of using the model for calculating dynamic modes.
Keywords: simulation modeling, Simulink, step response, step function, S-function, Duhamel integral.
The article provides a rationale for the hypothesis about the possibility of influencing changes in the destructive ability of genetic algorithm (GA) operators on the trajectory of population movement in the solution space directly during the operation of the evolutionary procedure for labor-intensive tasks. To solve this problem, it is proposed to use a control superstructure from an artificial neural network (ANN) or the "random forest" algorithm. The hypothesis is confirmed based on the results of computational experiments. This study presents the results obtained with calculations on CPU and CPU + GPGPU in a resource-intensive task of synthesizing dynamic simulation models of business processes using the mathematical apparatus of the Petri net theory (PN), and a comparison with the operation of GA without a control superstructure, GA and a control superstructure based on ANN of the RNN class, GA and the "random forest" algorithm. To simulate the operation of GA, ANN, the "random forest" algorithm, business process models, it is proposed to use a graph representation using various extensions of PN, examples of modeling the selected methods using the proposed mathematical apparatus are given. For the operation of the ANN and the random forest algorithm for recognizing the state of the GA population, a number of rules are proposed that allow the management of the solution synthesis process. Based on the computational experiments and their analysis, the strengths and weaknesses of using the proposed machine learning algorithms as a control superstructure are shown. The proposed hypothesis was confirmed based on the results of computational experiments.
Keywords: "Petri net, decision tree, random forest, machine learning, Petri net theory, bipartite directed graph, intelligent systems, evolutionary algorithms, decision support systems, mathematical modeling, graph theory, simulation modeling
The article describes the mathematical foundations of time-frequency analysis of signals using the algorithms Empirical Mode Decomposition (EMD), Intrinsic Time-Scale Decomposition (ITD) and Variational Mode Decomposition (VMD). Synthetic and real signals distorted by additive white Gaussian noise with different signal-to-noise ratio are considered. A comprehensive comparison of the EMD, ITD and VMD algorithms has been performed. The possibility of using these algorithms in the tasks of signal denoising and spectral analysis is investigated. The estimation of algorithm execution time and calculation stability is performed.
Keywords: time-frequency analysis, denoising, decomposition, mode, Hilbert-Huang transformation, Empirical Mode Decomposition, Intrinsic Time-Scale Decomposition, Variational Mode Decomposition
The paper proposes an approach to improve the efficiency of machine learning models used in monitoring tasks using metric spaces. To solve this problem, a method is proposed for assessing the quality of monitoring systems based on interval estimates of the response zones to a possible incident. This approach extends the classical metrics for evaluating machine learning models to take into account the specific requirements of monitoring tasks. The calculation of interval boundaries is based on probabilities derived from a classifier trained on historical data to detect dangerous states of the system. By combining the probability of an incident with the normalized distance to incidents in the training sample, it is possible to simultaneously improve all the considered quality metrics for monitoring - accuracy, completeness, and timeliness. One approach to improving results is to use the scalar product of the normalized components of the metric space and their importance as features in a machine learning model. The permutation feature importance method is used for this purpose, which does not depend on the chosen machine learning algorithm. Numerical experiments have shown that using distances in a metric space of incident points from the training sample can improve the early detection of dangerous situations by up to two times. This proposed approach is versatile and can be applied to various classification algorithms and distance calculation methods.
Keywords: monitoring, machine learning, state classification, incident prediction, lead time, anomaly detection
The article discusses the problems of wear of the feeding machine rollers associated with speed mismatch in the material tracking mode. Existing methods of dealing with wear and tear struggle with the effect of the problem not the cause. One of the ways to reduce the intensity of wear of roller barrels is to develop a method of controlling the speed of the feeding machin, which reduces the mismatch between the speeds of rollers and rolled products without violating the known technological requirements for creating pulling and braking forces. Disclosed is an algorithm for calculating speed adjustment based on metal tension which compensates for roller wear and reduces friction force. Modeling of the system with the developed algorithm showed the elimination of speed mismatch during material tracking and therefore it will reduce the intensity of roller wear.
Keywords: speed correction system, feeding machine, roller wear, metal tension, control system, speed mismatch, friction force reduction
PHP Data Objects (PDOs) represent a significant advancement in PHP application development by providing a universal approach to interacting with database management systems (DBMSs). This article opens with an introduction describing the need for PDOs as of PHP 5.1, which allows PHP developers to interact with different databases through a single interface, minimising the effort involved in portability and code maintenance. It discusses how PDO can improve security by supporting prepared queries, which is a defence against SQL injection. The main part of the paper analyses the key advantages of PDO, such as its versatility in connecting to multiple databases (e.g. MySQL, PostgreSQL, SQLite), the ability to use prepared queries to enhance security, improved error handling through exceptions, transactional support for data integrity, and the ease of learning the PDO API even for beginners. Practical examples are provided, including preparing and executing SQL queries, setting attributes via the setAttribute method, and performing operations in transactions, emphasising the flexibility and robustness of PDO. In addition, the paper discusses best practices for using PDO in complex and high-volume projects, such as using prepared queries for bulk data insertion, query optimisation and stream processing for efficient handling of large amounts of data. The conclusion section characterises PDO as the preferred tool for modern web applications, offering a combination of security, performance and code quality enhancement. The authors also suggest directions for future research regarding security test automation and the impact of different data models on application performance.
Keywords: PHP, PDO, databases, DBMS, security, prepared queries, transactions, programming
The article presents the main stages and recommendations for the development of an information and analytical system (IAS) based on geographic information systems (GIS) in the field of rational management of forest resources, providing for the processing, storage and presentation of information on forest wood resources, as well as a description of some specific examples of the implementation of its individual components and digital technologies. The following stages of IAS development are considered: the stage of collecting and structuring data on forest wood resources; the stage of justifying the type of software implementation of the IAS; the stage of equipment selection; the stage of developing a data analysis and processing unit; the stage of developing the architecture of interaction of IAS blocks; the stage of developing the IAS application interface; the stage of testing the IAS. It is proposed to implement the interaction between the client and server parts based on Asynchronous JavaScript and XML (AJAX) technology. It is recommended to use the open source Leaflet libraries for visualization of geodata. To store large amounts of data on the server, it is proposed to use the SQLite database management system. The proposed approaches can find application in the creation of an IAS for the formation of management decisions in the field of rational management of forest wood resources.
Keywords: geographic information systems, forest resources, methodology, web application, AJAX technology, SQLite, Leaflet, information processing
More attention is being paid to the transition to domestic software with the digitalisation of the construction industry and import substitution. At each stage of construction, additional products are needed, including CAD and BIM. The experience of integration of Russian-made systems for the tasks of information modeling of transport infrastructure and road construction is considered. Within the framework of the work the integration of Vitro-CAD CDE and Topomatic Robur software system was performed. Joint work of the construction project participants in a single information space was organized. The efficiency of work of the project participants was determined due to the release from routine operations. Integration experience has shown that the combination of Vitro-CAD and Topomatic Robur allows to manage project data efficiently, store files with version tracking, coordinate documentation and issue comments to it.
Keywords: common data environment, information space, information model, digital ecosystem, computer-aided design, building information modeling, automation, integration, import substitution, software complex, platform, design documentation, road construction
When evaluating student work, the analysis of written assignments, particularly the analysis of source code, becomes particularly relevant. This article discusses an approach for evaluating the dynamics of feature changes in students' source code. Various metrics of source code are analyzed and key metrics are identified, including quantitative metrics, program control flow complexity metrics, and the TIOBE quality indicator. A set of text data containing program source codes from a website dedicated to practical programming, was used to determine threshold values for each metric and categorize them. The obtained results were used to conduct an analysis of students' source code using a developed service that allows for the evaluation of work based on key features, the observation of dynamics in code indicators, and the understanding of a student's position within the group based on the obtained values.
Keywords: machine learning, text data analysis, program code analysis, digital footprint, data visualization
This article discusses two of the most popular algorithms for constructing dominator trees in the context of static code analysis in the Solidity programming language. Both algorithms, the Cooper, Harvey, Kennedy iterative algorithm and the Lengauer-Tarjan algorithm, are considered effective and widely used in practice. The article compares these algorithms, evaluates their complexity, and selects the most preferable option in the context of Solidity. Criteria such as execution time and memory usage were used for comparison. The Cooper, Harvey, Kennedy iterative algorithm showed higher performance when working with small projects, while the Lengauer-Tarjan algorithm performed better when analyzing larger projects. However, overall, the Cooper, Harvey, Kennedy iterative algorithm was found to be more preferable in the context of Solidity as it showed higher efficiency and accuracy when analyzing smart contracts in this programming language. In conclusion, this article may be useful for developers and researchers who are involved in static code analysis in the Solidity language, and who can use the results and conclusions of this study in their work.
Keywords: dominator tree, Solidity, algorithm comparison
This article explores the probabilistic characteristics of closed queuing systems, with a particular focus on the differences between "patient" and "impatient" demands. These categories of requests play a crucial role in understanding the dynamics of service, as patient demands wait in line, while impatient ones may be rejected if their waiting time exceeds a certain threshold. The uniqueness of this work lies in the analysis of a system with a three-component structure of incoming flow, which allows for a more detailed examination of the behavior of requests and the influence of various factors on service efficiency. The article derives key analytical expressions for determining probabilistic characteristics such as average queue length, rejection probability, and other critical metrics. These expressions enable not only the assessment of the current state of the system but also the prediction of its behavior under various load scenarios. The results of this research may be useful for both theoretical exploration of queuing systems and practical application in fields such as telecommunications, transportation, and service industries. The findings will assist specialists in developing more effective strategies for managing request flows, thereby improving service quality and reducing costs.
Keywords: waiting, queue, service, markov process, queuing system with constraints, flow of requests, simulation modeling, mathematical model
Oil spills require timely measures to eliminate the causes and neutralize the consequences. The use of a case-based reasoning is promising to develop specific technological solutions in order to eliminate oil spills. It becomes important to structure the description of possible situations and the formation of a representation of solutions. In this paper, the results of these tasks are presented. A structure is proposed for representing situations in oil product spills based on a situation tree, a description of the algorithm for situational decision-making using this structure is given, parameters for describing situations in oil product spills and presenting solutions are proposed. The situation tree allows you to form a representation of situations based on the analysis of various source information. This approach makes it possible to quickly clarify the parameters and select similar situations from the knowledge base, the solutions of which can be used in the current undesirable situation.
Keywords: case-based reasoning; decision making; oil spill, oil spill response, decision support, situation tree
The article considers the possibility of modeling the random forest machine learning algorithm using the mathematical apparatus of Petri net theory. The proposed approach is based on the use of three types of Petri net extensions: classical, colored nets, and nested nets. For this purpose, the paper considers the general structure of decision trees and the rules for constructing models based on a bipartite directed graph with a subsequent transition to the random forest machine learning algorithm. The article provides examples of modeling this algorithm using Petri nets with the formation of a tree of reachable markings, which corresponds to the operation of both decision trees and a random forest.
Keywords: Petri net, decision tree, random forest, machine learning, Petri net theory, bipartite directed graph, intelligent systems, evolutionary algorithms, decision support systems, mathematical modeling, graph theory, simulation modeling
Many modern information processing and control systems for various fields are based on software and hardware for image processing and analysis. At the same time, it is often necessary to ensure the storage and transmission of large data sets, including image collections. Data compression technologies are used to reduce the amount of memory required and increase the speed of information transmission. To date, approaches based on the use of discrete wavelet transformations have been developed and applied. The advantage of these transformations is the ability to localize the points of brightness change in images. The detailing coefficients corresponding to such points make a significant contribution to the energy of the image. This contribution can be quantified in the form of weights, the analysis of which allows us to determine the method of quantization of the coefficients of the wavelet transform in the proposed lossy compression method. The approach described in the paper corresponds to the general scheme of image compression and provides for the stages of transformation, quantization and encoding. It provides good compression performance and can be used in information processing and control systems.
Keywords: image processing, image compression, redundancy in images, general image compression scheme, wavelet transform, compression based on wavelet transform, weight model, significance of detail coefficients, quantization, entropy coding
The work is devoted to the development and analysis of computer vision algorithms designed to recognize objects in conditions of limited visibility, such as fog, rain or poor lighting. In the context of modern requirements for safety and automation, the task of identifying objects becomes especially relevant. The theoretical foundations of computer vision methods and their application in difficult conditions are considered. An analysis of image processing algorithms is carried out, including machine learning and deep learning methods that are adapted to work in conditions of poor visibility. The results of experiments demonstrating the effectiveness of the proposed approaches are presented, as well as a comparison with existing recognition systems. The results of the study can be useful in the development of autonomous vehicles and video surveillance systems.
Keywords: computer vision, mathematical modeling, software package, machine learning methods, autonomous transport systems
In systems for monitoring, diagnostics and recognition of the state of various types of objects, an important aspect is the reduction of the volume of measured signal data for its transmission or accumulation in information bases with the ability to restore it without significant distortion. A special type of signals in this case are packet signals, which represent sets of harmonics with multiple frequencies and are truly periodic with a clearly distinguishable period. Signals of this type are typical for mechanical, electromechanical systems with rotating elements: reducers, gearboxes, electric motors, internal combustion engines, etc. The article considers a number of models for reducing these signals and cases of priority application of each of them. In particular, the following are highlighted: the discrete Fourier transform model with a modified formula for restoring a continuous signal, the proposed model based on decomposition by bordering functions and the discrete cosine transform model. The first two models ideally provide absolute accuracy of signal restoration after reduction, the last one refers to reduction models with information loss. The main criteria for evaluating the models are: computational complexity of the implemented transformations, the degree of implemented signal reduction, and the error in restoring the signal from the reduced data. It was found that in the case of application to packet signals, each of the listed models can be used, the choice being determined by the priority indicators of the reduction assessment. The application of the considered reduction models is possible in information and measuring systems for monitoring the state, diagnostics, and control of the above-mentioned objects.
Keywords: reduction model, measured packet signal, discrete cosine transform, decomposition into bordering functions, reduction quality assessment, information-measuring system
The current situation in the practice of designing complex technical systems with metrological support is characterized by the following important features: a) the initial information that can actually be collected and prepared at the early stages of design for solving probabilistic problems turns out, as a rule, to be incomplete, inaccurate and, to a high degree, uncertain; b) the form of specifying the initial information (in the form of constraints) in problems can be very diverse: average and dispersion characteristics or functions of them, measurement errors or functions of them, characteristics specified by a probability measure, etc. These circumstances necessitate the formulation and study of new mathematical problems of characterizing distribution laws and developing methods and algorithms for solving them, taking into account the constraints on the value and nature of change of the determining parameter (random variable) of a complex technical system. As a generalized integral characteristic of the determining parameter, the law of its distribution is chosen, which, as is commonly believed, fully characterizes the random variable under study. The purpose of this work is to develop a method that allows constructing distribution laws of the determining parameter of a complex technical system using the minimum amount of available information based on the application of Chebyshev inequalities. A method for characterizing the distribution law by the property of maximum entropy is presented, designed to model the determining parameter of complex technical systems with metrological support. Unlike the classical characterization method, the proposed method is based on the use of Chebyshev inequalities instead of restrictions on statistical moments. An algorithm for constructing the distribution function of the determining parameter is described. A comparison is given of the results of constructing distribution laws using the developed method and using the classical variational calculus.
Keywords: Chebyshev inequalities, complex technical system, design, determining parameter, characterization of distribution law
In operational diagnostics and recognition of states of complex technical systems, an important task is to identify small time-determined changes in complex measured diagnostic signals of the controlled object. For these purposes, the signal is transformed into a small-sized image in the diagnostic feature space, moving along trajectories of different shapes, depending on the nature and magnitude of the changes. It is important to identify stable and deterministic patterns of changes in these complex-shaped diagnostic signals. Identification of such patterns largely depends on the principles of constructing a small-sized feature space. In the article, the space of decomposition coefficients of the measured signal in the adaptive orthonormal basis of canonical transformations is considered as such a space. In this case, the basis is constructed based on a representative sample of realizations of the controlled signal for various states of the system using the proposed algorithm. The identified shapes of the trajectories of the images correspond to specific types of deterministic changes in the signal. Analytical functional dependencies were discovered linking a specific type of signal change with the shape of the trajectory of the image in the feature space. The proposed approach, when used, simplifies modeling, operational diagnostics and condition monitoring during the implementation of, for example, low-frequency diagnostics and defectoscopy of structures, vibration diagnostics, monitoring of the stress state of an object by analyzing the time characteristics of response functions to impact.
Keywords: modeling, functional dependencies, state recognition, diagnostic image, image movement trajectories, small changes in diagnostic signals, canonical decomposition basis, analytical description of image trajectory
The article solves the problem of automated generation of user roles using machine learning methods. To solve the problem, cluster data analysis methods implemented in Python in the Google Colab development environment are used. Based on the results obtained, a method for generating user roles was developed and tested, which allows reducing the time for generating a role-based access control model.
Keywords: machine learning, role-based access control model, clustering, k-means method, hierarchical clustering, DBSCAN method
In order to optimize the operation of dust-settling chambers of steelmaking furnace emission purification systems and increase the overall efficiency of the cleaning system, the movement of gas-air flows and dust particles of different diameters inside dust-collecting chambers was studied using the SolidWorks software product with the FlowSimulation application, which allowed us to investigate the influence of a number of factors, for example, fractional composition, the condition of the working surfaces of chambers, on the movement of gas-air the flow.
Keywords: steelmaking furnace, gas-air flow, dust-settling chamber, cleaning efficiency, dust, dispersed composition, modeling