RISS 검색 - 학위논문 상세보기

다국어 초록 (Multilingual Abstract)

Unlike the existing deep learning model which collects data in one place and conducts deep learning, The Federated Learning which conducts deep learning through parameter sharing between the central server and the local client while data is distributed to the local device that can improve the performance of the deep learning model has many advantages such as solving problems of protecting personal information and communication efficiency.
However, there are still many problems to be solved such as real-time communication management, non-iid problem, statistical heterogeneity problem of data, and system heterogeneity problem in different environments. To solve this problem, research about improving Aggregation Algorithm is being conducted actively.
The existing research about the Aggregation Algorithm compare performance indicators like Accuracy, F1-score, AUROC with FedAVG, a baseline algorithm in Federated Learning for verifying performance improvement. However, unlike the existing deep learning model, the Federated Learning model, which is actually communicated and is trained, has a big difference in the time, communication frequency, communication volume, and computation amount required for training model according to the Aggregation Algorithm. And that has to be considered at “Designing Aggregation Algorithm” step which is the first step to be selected with performance indicators when applying Federated Learning.
Therefore, we implemented the environment for the Federated Learning experiment with the FedAVG algorithm, the FedSGD algorithm, and the TopKAVG algorithm in 3 computers and a notebook using the Django-based Federated Learning experiment framework where actual communication is performed. And, we compared performance with Accuracy, time for Federated Learning, and numbers of communication.
The FedAVG algorithm takes 6.1 seconds per round for the MNIST classification model, and it shows an Accuracy of 0.915. And the FedAVG takes 23.1 seconds per round for the ECG classification model, and it achieves an Accuracy of 0.986, shows the highest performance with proper time for Federated Learning. When using the FedSGD algorithm, the MNIST classification model showed an average of 91.8 seconds per round and an Accuracy of 0.926, while the ECG classification model showed the longest learning time, achieving an average of 49.8 seconds per round and an Accuracy of 0.981, but there wasn't noticeable performance difference. When using the TopKAVG, it showed that Federated Learning with 6 clients was not done properly because of high proportion of data that was not reflected in the model. However, despite the more number of communication than FedAVG, the MNIST classification model showed the fastest time with an average of 5.14 seconds per round and the ECG classification model showed an average of 13.0 seconds per round.
When selecting Aggregation Algorithm according to various situations where Federated Learning is used, we expect that more efficient Federated Learning can be implemented if you consider time with the performance.

국문 초록 (Abstract)

데이터를 한 곳에 모아 학습을 진행하는 기존 딥러닝 모델과 달리 로컬 디바이스에 데이터가 분산된 상태에서 중앙 서버와 로컬 클라이언트 사이의 파라미터 공유를 통해 학습을 진행하며 ...

데이터를 한 곳에 모아 학습을 진행하는 기존 딥러닝 모델과 달리 로컬 디바이스에 데이터가 분산된 상태에서 중앙 서버와 로컬 클라이언트 사이의 파라미터 공유를 통해 학습을 진행하며 딥러닝 모델의 성능을 개선할 수 있는 연합학습은 개인정보 보호 문제 해결, 통신의 효율성 등 많은 장점을 가지고 있다.
그러나 아직 연합학습은 실시간 통신관리, Non-iid 문제, 데이터의 통계적 이질성 문제, 각기 다른 환경에서의 시스템 이질성 문제 등 해결해야할 문제가 많다. 이를 해결하기 위해 취합 알고리즘을 개선하는 연구가 활발히 이루어지고 있다.
기존 취합 알고리즘 연구는 Baseline 알고리즘인 FedAVG와의 각 성능 지표인 Accuracy, F1-score, AUROC 등을 비교하며 성능 개선을 검증하고 있다. 하지만, 기존 딥러닝 모델과 달리 실제 통신이 이루어지며 학습이 진행되는 연합학습 모델은 취합 알고리즘에 따라 학습에 소요되는 시간과 통신횟수와 통신량, 연산량에서 큰 차이가 발생한다. 이는 연합학습을 적용할 때 가장 먼저 결정하는 취합 알고리즘을 설계할 때, 성능 지표와 함께 같이 고려되어야 하는 사항이다.
따라서, 본 연구에서는 FedAVG 알고리즘, FedSGD 알고리즘, TopKAVG 알고리즘을 통해 진행되는 연합학습 실험 환경을 실제 통신이 이루어지는 Django 기반 연합학습 실험 프레임워크를 활용해 3대의 컴퓨터와 1대의 노트북에 구현하고, 클라이언트 6개를 포함한 연합학습을 진행하여 취합 알고리즘간 차이가 가장 명확한 성능 지표인 Accuracy와 함께 통신횟수, 학습에 소요되는 시간을 측정하여 비교 분석하였다.
FedAVG 알고리즘은 MNIST 분류 모델의 경우 라운드당 평균 6.1초 소요되며, 0.915의 Accuracy를 나타내었고, ECG 분류 모델의 경우 라운드당 평균 23.1초 소요되며, 0.986의 Accuracy를 달성하며 가장 학습 소요 시간 대비 높은 성능을 나타내었다. FedSGD 알고리즘을 사용한 경우, MNIST 분류 모델은 라운드당 평균 91.8초, 0.926의 Accuracy를 나타내었고, ECG 분류 모델은 라운드당 평균 49.8초, 0.981의 Accuracy를 달성하며 가장 긴 학습 소요 시간을 나타내었지만, 눈에 띄는 성능 차이는 보이지 않았다. TopKAVG의 경우, 6개의 클라이언트에서는 학습에 반영되지 않는 데이터의 비중이 높아, 학습이 제대로 이루어지지 못하는 모습을 보였다 하지만, FedAVG보다 더 많은 통신횟수에도 불구하고, ECG 분류 모델은 라운드당 평균 13.0초, MNIST 분류 모델은 라운드당 평균 5.14초로 가장 빠른 학습 시간을 나타내었다.
연합학습이 사용되는 여러 상황에 따라 취합 알고리즘을 선택할 때, 학습의 성능 뿐 아니라 소요 시간을 참고하여 고려한다면, 더욱 효율적인 연합학습을 구현할 수 있으리라 기대한다.

목차 (Table of Contents)

Ⅰ. 서론 1
Ⅱ. 관련 연구 3
2.1 이론적 배경 3
2.1.1 연합학습(Federated Learning) 4

Ⅰ. 서론 1
Ⅱ. 관련 연구 3
2.1 이론적 배경 3
2.1.1 연합학습(Federated Learning) 4
2.1.2 취합 알고리즘(Aggregation Algorithm) 6
2.2 연합학습 프레임워크 9
2.3 딥러닝 모델 관련 연구 10
2.4 연구의 필요성 11
Ⅲ. 실험 13
3.1 데이터 전처리 13
3.2 실험 환경 14
Ⅳ. 성능 비교 16
4.1 MNIST 분류 모델 16
4.2 ECG 분류 모델 18
Ⅴ. 결론 21
□ 참고문헌 22
□ Abstract 24

상세검색

RISS 보유자료

상세검색

해외전자자료

연합학습의 취합 알고리즘에 대한 연구 = Study about aggregation algorithm of federated learning

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료