This study explored the effectiveness of combining max pooling and average pooling techniques to enhance the performance of acoustic event recognition systems. Through experiments conducted using the DESED dataset, it was found that the model employin...
This study explored the effectiveness of combining max pooling and average pooling techniques to enhance the performance of acoustic event recognition systems. Through experiments conducted using the DESED dataset, it was found that the model employing max pooling at the initial layer significantly droped recognition performance compared to the baseline model. This finding suggests that there may be an optimized pooling technique suitable for each layer within the model. Future work will focus on further enhancing performance by developing and applying layer-specific optimized pooling techniques.