Dataset condensation techniques aim to create smaller, representative subsets of larger datasets while preserving essential information. While random noise or random real images are commonly used for initialization, their effectiveness is limited. In ...
Dataset condensation techniques aim to create smaller, representative subsets of larger datasets while preserving essential information. While random noise or random real images are commonly used for initialization, their effectiveness is limited. In this study, we explore alternative initialization strategies for dataset condensation, such as GAN-generated images, diffusion model- generated images, and K-Center images. These methods are compared with random real initialization through a comprehensive analysis using benchmark datasets. Our evaluation focuses on how initialization impacts condensation performance metrics, specifically testing accuracy after training models from scratch using the condensed data. The findings highlight the importance of informed initialization and provide insights for optimizing dataset condensation techniques. Notably, while K-Center initialization yields the best performance, pre-trained GAN or diffusion model-generated image initialization also demonstrates good performance compared to random real or random noise initialization.