This study was initiated to find ways to prevent speech synthesis technology from causing the uncanny valley effect to users in the field of speech therapy. Through past research, microdiagnosis of segmental sounds and solutions for them were explored...
This study was initiated to find ways to prevent speech synthesis technology from causing the uncanny valley effect to users in the field of speech therapy. Through past research, microdiagnosis of segmental sounds and solutions for them were explored. Here, the items to be diagnosed and the diagnostic methods for microdiagnosis of suprasegmental sounds were systematically presented. The diagnostic items for suprasegmental sounds were categorized into three dimensions: length, pitch, and intensity of the sound. For each dimension, it has been shown that pause duration and speech rate should be considered as diagnostic criteria for 'length,' pitch range, pitch level, and pitch variability for 'pitch,' and sound pressure level, peak amplitude, mean square amplitude, and other factors for 'intensity.' Moreover, these measurements, in common, are not solely concerned with achieving accurate speech; they are also relevant to the nuanced aspects of emotional expression, depending on the degree and level. Therefore, we meticulously examine the characteristics inherent in natural human speech for these elements and establish the current voice. It was mentioned that if you calmly reflect the elements that cannot be implemented in synthetic technology, you can enter a more complete technology section where you do not feel the uncanny valley. Nevertheless, it is essential to emphasize that the speech therapy AI discussed here serves as an exemplar of AI requiring delicate communication, and the scope of application proposed in this study can be extended to encompass psychological counseling AI, educational AI, and beyond.