Currently, papers and patents search sites simply provide the function of browsing published papers and patents, or collecting research results registered by name. However, if the names of the same name are not distinguished, it becomes somewhat cumbe...
Currently, papers and patents search sites simply provide the function of browsing published papers and patents, or collecting research results registered by name. However, if the names of the same name are not distinguished, it becomes somewhat cumbersome to judge the research results. So, various studies are being conducted to resolve the ambiguity of the name called a name disambiguation. In this paper, we design and propose a name disambiguation scheme based on research results such as the papers and patents. The proposed scheme utilizes Spark that is a big data processing platform to process large amounts of data. Unlike existing schemes, it has the advantage of reflecting newly added research results in that new data are collected and stored by real-time collecting system. And we utilize GCN and HAC, which are populary used in existing name disambiguation studies. The proposed scheme shows that when a total of 7,104,000 papers and patents data are learned by the GCN algorithm, it performs about 1.21 times faster in a distributed environment than in a single server environment. As the number of data increased, the effectiveness of distributed processing through Spark clusters becomes more prominent.