Traditional cyberattacks were primarily directed at individual users, but in recent years sophisticated targeted attacks against specific organizations and critical infrastructure have become increasingly common. A representative case is the incident ...
Traditional cyberattacks were primarily directed at individual users, but in recent years sophisticated targeted attacks against specific organizations and critical infrastructure have become increasingly common. A representative case is the incident in which the operations of the U.S. pipeline company Colonial Pipeline were suspended due to a ransomware attack. This demonstrates that threats caused by malware can paralyze industrial and social infrastructure, far beyond merely affecting individual users. As the nature of attacks evolves in this way, it is becoming increasingly important not only to block attacks but also to accurately identify the malware used in them and to determine who the attacker is and what characteristics the attack has. Executable file analysis is a core technology in cybersecurity and software engineering for malware identification and classification, code similarity analysis, and vulnerability analysis. In particular, malware identification and classification essentially rely on features extracted through traditional static and dynamic analysis. Consequently, attackers employ advanced concealment and evasion techniques to avoid detection and analysis. This makes feature extraction based on conventional analysis increasingly difficult and ultimately degrades the classification performance of machine learning and deep learning models. To overcome these limitations, this dissertation proposes two complementary visualization-based analysis frameworks for executable malware. Both frameworks transform features extracted from executable files into images so that convolutional neural networks (CNNs) can effectively learn discriminative patterns. Although CNNs have already demonstrated excellent performance in general image classification tasks, conventional methods that simply convert packed or obfuscated executables into grayscale images reveal a limitation in that they do not reliably classify malware under realistic conditions. Hyperparameter tuning and the design of new deep learning architectures are also important research directions for improving performance, but this work focuses on a more fundamental challenge—designing robust visualization-based feature representations in which the inherent characteristics of malware remain visible even under various analysis-evasion techniques. Accordingly, the CNN architecture and training configuration are fixed to a standard setup, and only the input representations and visualization schemes are varied in order to systematically analyze their impact on classification accuracy and generalization. First, in the dynamic-analysis domain, we present PerfSight, a behavioral performance visualization framework. PerfSight collects the usage of system resources such as CPU, memory, and I/O as time-series data and converts them into images, thereby extracting features that are robust against analysis-evasion and concealment techniques. Experiments on real-world ransomware show that, even with a simple CNN model, PerfSight achieves a high classification accuracy of at least 98.94%, demonstrating that it provides sufficient performance for ransomware classification. Second, in the static-analysis domain, we introduce BinSight, a kernel density estimation (KDE)-based visualization framework. BinSight addresses the limitation that grayscale image–based visualization cannot adequately express code structure and data distribution. It converts various structural features extracted from executable files into two-dimensional density images via KDE, thereby preserving structural characteristics while providing inputs that are well suited for CNNs. Experiments on Windows PE executables under a rigorous, leakage-controlled protocol show that BinSight achieves a macro F1-score of 97.59% on a challenging code structure–based dataset, compared to 24.90% for the grayscale baseline, corresponding to a 72.69% improvement in performance. On the byte-based dataset, BinSight also yields a consistent macro F1-score improvement of 2.57%. PerfSight and BinSight each clearly overcome the limitations of existing visualization techniques in their respective domains. This dissertation experimentally demonstrates that the performance and stability of visualization-based malware classification strongly depend on how effectively the intrinsic characteristics of executable files are captured in the input representation, and it presents a research direction toward more robust feature extraction and visualization methods for increasingly sophisticated malware. Furthermore, the two frameworks can be combined in a complementary manner to form the basis of an effective integrated analysis pipeline for large-scale automated classification of executable malware and practical deployment in real-world environments.