RISS 검색 - 학위논문 상세보기

다국어 초록 (Multilingual Abstract)

Nowadays, thanks to the exponential advancements of computational resources along with the massive surge of image quantity and quality, deep learning technique, a special branch of Artificial Intelligence, achieves extraordinary performance in various computer vision tasks comprising image classification and semantic segmentation. Besides that, in the current era of Industry 4.0, vision-oriented applications become vastly significant in everyday life, smart healthcare, and industrial manufacture, to name a few. Accordingly, in the literature, there emerges tremendous researches that introduce deep learning architecture in form of convolutional neural network (CNN) for tackling the problem of understanding image semantically for the above-mentioned software products. However, since there are still limitations in the related works of semantic image segmentation and image classification in several specialized domains, this thesis presents a Bracket-style CNN and its variants to tackle the existing issues, respectively.
Firstly, regarding the problem of semantic image segmentation, which is equivalent to image's pixel-level classification, the key mechanism in a predefined deep learning model is to be capable of coordinating globally contextual information with locally fine details in the input image for generating optimal segmentation map. But nonetheless, existing work did not exhaustively exploit middle-level features in the CNN, which carry reasonable balance between fine-grained and semantic information, to boost the effectiveness of the above-mentioned procedure. Hence, a Bracket-shaped CNN is proposed to leverage the exploitation of middle-level feature maps in a tournament by exhaustively pairing adjacent ones through attention embedded combination modules. Such routine repeats round-by-round until the prediction map of densely enriched semantic contexts is finalized. It is worth noting that the approach of combining two neighboring feature maps having different resolutions is defined by adopting a cross-attentional fusion mechanism, namely CAF module. The major objective is to properly fusion semantically rich information (of the lower-resolution inputs) with finely patterned features (of the higher-resolution versions) for the outputs. As a consequence, the proposed semantic segmentation model is trained and evaluated on three well-known datasets, from which competitive performance in terms of mean Intersection of Union (compared to novel methods in the literature) is attained as follows: PASCAL VOC 2012 [20] (83.6%), CamVid [9] (76.4%) and Cityscapes [18] (78.3%) datasets. Furthermore, the proposed architecture is shown to be flexibly manipulated by round-wise features aggregation to perform the per-pixel labeling task efficiently on dataset with heavily class-imbalancing issue such as DRIVE [80], which aims at retinal blood vessel segmentation, in comparison with the state-of-the-arts. Particularly, Sensitivity, Specificity, Accuracy, and Area Under the Receiver Operating Characteristics achieve 79.32%, 97.41%, 95.11%, and 97.32%, respectively.
Secondly, the proposed Bracket-style concept in this thesis can be extended as variants for effectively classifying image in specialized domains such as Diabetic Retinopathy (DR) grading and facial expression recognition (FER). Concretely, in such kind of deep learning model, channel-wise attentional features of semantically-rich (high-level) information are integrated into finely-patterned (low-level) details in a feedback-like manner, a.k.a. single-mode Bracket-structured network (sCAB-Net). Accordingly, feature maps of different scales can be amalgamated for extensively involving spatially-rich representations to the final predictions. From the evaluation process, impressive benchmark results on the aforementioned areas, wherein spatially-rich factors play an important role to the decision of image label, are achieved. On the one hand, with respect to DR recognition, the proposed architecture reaches a remarkable quadratic weighted kappa of 85.6% on Kaggle DR Detection dataset [47]. On the other hand, about FER, it gains a mean class accuracy of 79.3% on RAF-DB dataset [58].
In overall, the above-mentioned operational characteristics and experimental achievements demonstrate a promising capability of the proposed Bracket-style network toward complete image understanding (by either semantic segmentation (pixel-level labeling) or classification (image-level labeling) performance) for further practical computer perception-based applications.

번역하기

목차 (Table of Contents)

Abstract i
Acknowledgements iii
List of Figures vii
List of Tables ix
1 Introduction 1

Abstract i
Acknowledgements iii
List of Figures vii
List of Tables ix
1 Introduction 1
1.1 Overview of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Image Classification and Semantic Segmentation using Deep Learning . . . . . . . . 3
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Major Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Related Work 13
2.1 Symmetrically-structured Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Asymmetrically-structured Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Proposed Methodology 17
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Overview of Convolutional Neural Network . . . . . . . . . . . . . . . . . . . 17
Convolutional layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Non-linear Activation layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Pooling layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Fully Connected layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Softmax (Classification) layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Modeling of Convolutional Neural Network . . . . . . . . . . . . . . . . . . . 27
3.1.3 Configurations and Hyperparameter Settings for Training Process . . . . . . 29
3.2 Bracket-shaped Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . 34
3.3 Cross-Attentional Fusion Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Experiments on Natural Image Segmentation 41
4.1 Benchmark Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.1 PASCAL VOC 2012 [20] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.2 CamVid [9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.3 Cityscapes [18] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.4 MS-COCO [64] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Training Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.1 The contribution of backbone CNN to final performance . . . . . . . . . . . . 45
4.3.2 The effectiveness of Bracket-style decoding network over the Ladder/U-shaped
counterpart for leveraging middle-level features . . . . . . . . . . . . 46
4.3.3 The coordination between Bracket-shaped Network and CAF-based Connections
for leveraging middle-level features . . . . . . . . . . . . . . . . . . . 47
4.3.4 Representation of feature maps with respect to different attentional schemes 48
4.4 Comparison with State-of-the-art Methods . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.1 PASCAL VOC 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.2 CamVid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.3 Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4.4 MS-COCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4.5 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5 Bracket-style Network Variant for Medical Image Segmentation 62
5.1 Domain Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Descriptions of Bracket-style Network Variant for Medical Image Segmentation . . . 64
5.2.1 Bracket-shaped Convolutional Neural Networks . . . . . . . . . . . . . . . . . 64
5.2.2 Round-wise Features Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.1 Benchmark Dataset: DRIVE [80] . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.2 Training Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.3 Experimental Results and Analyses . . . . . . . . . . . . . . . . . . . . . . . . 69
6 Bracket-style Network Variant for Image Classification 71
6.1 Domain Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.1.1 Diabetic Retinopathy Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.1.2 Facial Expression Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1.3 Common Problem Statement and Proposed Solution . . . . . . . . . . . . . . 74
6.2 Descriptions of Bracket-style Network Variant for Image Classification . . . . . . . . 76
6.2.1 Backbone CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.2 Channel-wisely Cross-Attentional (CCA) Stream . . . . . . . . . . . . . . . . . 78
6.3 Experiments on Diabetic Retinopathy Detection . . . . . . . . . . . . . . . . . . . . . 82
6.3.1 Benchmark Dataset: Kaggle DR Detection [47] . . . . . . . . . . . . . . . . . . 82
6.3.2 Training Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3.4 Comparisons with State-of-the-arts . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4 Experiments on Facial Expression Recognition . . . . . . . . . . . . . . . . . . . . . . 85
6.4.1 Benchmark Dataset: RAF-DB [58] . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.4.2 Training Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.4.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.4.4 Comparison with State-of-the-art Methods . . . . . . . . . . . . . . . . . . . . 87
7 Conclusions and Future Direction 90
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2 Future Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Bibliography 93
A List of Publications 103

참고문헌 (Reference)

1 V. Gulshan, "Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus PhotographsAccuracy of a Deep Learning Algorithm for Detection of Diabetic RetinopathyAccuracy of a Deep Learning Algorithm for Detection of Diabetic Retinopathy", pp . 2402 ? 2410 . ISSN : 0098-7484 ., 2016

2 C. Szegedy, "Going deeper with convolutions", pp . 1 ? 9 . DOI : 10.1109/CVPR.2015.7298594 ., 2015

3 K.K . Maninis, "Deep Retinal Image Understanding", 2016

4 Adam Paszke, "Automatic Differentiation in PyTorch", 2017

5 G. Huang, "Densely Connected Convolutional Networks", pp . 2261 ? 2269 . DOI : 10.1109/CVPR . 2017.243 ., 2017

6 F. Pedregosa, "Scikit-learn : Machine Learning in Python", 12 (pp . 2825 ? 2830, 2011

7 Tsung-Yi Lin, "Microsoft COCO : Common Objects in Context", 2014

8 K. He, "Deep Residual Learning for Image Recognition", pp . 770 ? 778, 2016

9 B. Hariharan, "Semantic contours from inverse detectors ”", pp . 991 ? 998, 2011

10 Jun Fu, "Dual Attention Network for Scene Segmentation", pp . 3146 ? 3154, 2019

2 C. Szegedy, "Going deeper with convolutions", pp . 1 ? 9 . DOI : 10.1109/CVPR.2015.7298594 ., 2015

3 K.K . Maninis, "Deep Retinal Image Understanding", 2016

4 Adam Paszke, "Automatic Differentiation in PyTorch", 2017

5 G. Huang, "Densely Connected Convolutional Networks", pp . 2261 ? 2269 . DOI : 10.1109/CVPR . 2017.243 ., 2017

6 F. Pedregosa, "Scikit-learn : Machine Learning in Python", 12 (pp . 2825 ? 2830, 2011

7 Tsung-Yi Lin, "Microsoft COCO : Common Objects in Context", 2014

8 K. He, "Deep Residual Learning for Image Recognition", pp . 770 ? 778, 2016

9 B. Hariharan, "Semantic contours from inverse detectors ”", pp . 991 ? 998, 2011

10 Jun Fu, "Dual Attention Network for Scene Segmentation", pp . 3146 ? 3154, 2019

11 T. Y. Lin, "Feature Pyramid Networks for Object Detection", pp . 936 ? 944 ., 2017

12 X. Zhang, "Fast Semantic Segmentation for Scene Perception", pp . 1183 ? 1192, 2019

13 Hanchao Li, "Pyramid Attention Network for Semantic Segmentation", p. 285, 2018

14 Y. Lecun, "Gradient-based learning applied to document recognition", pp . 2278 ? 2324 . DOI : 10.1109/5.726791 ., 1998

15 C. Szegedy, "Rethinking the Inception Architecture for Computer Vision", pp . 2818 ? 2826, 2016

16 M. A. Islam, "Gated Feedback Refinement Network for Dense Image Labeling", pp . 4877 ? 4885, 2017

17 J. Staal, "Ridge-based vessel segmentation in color images of the retina", pp . 501 ? 509 . ISSN : 0278-0062 . DOI : 10.1109/ TMI.2004.825627 ., 2004

18 M. Cordts, "The Cityscapes Dataset for Semantic Urban Scene Understanding", pp . 3213 ? 3223, 2016

19 F. Chollet, "Xception : Deep learning with depthwise separable convolutions", pp . 1800 ? 1807 ., 2017

20 G. Lin, "RefineNet : Multi-Path Refinement Networks for Dense Prediction", pp . 1 ? 1, 2019

21 M. Everingham, "The PASCAL Visual Object Classes Challenge 2012 ( VOC2012 ) Results", 2012

22 Martin Abadi, "TensorFlow : Large-Scale Machine Learning on Heterogeneous Systems .", 2015

23 Z. Wang, "Zoom-in-Net : Deep Mining Lesions for Diabetic Retinopathy Detection", pp . 267 ? 275 . ISBN : 978-3-319-66179-7 ., 2017

24 Y. Chen, "Diabetic Retinopathy Detection Based on Deep Convolutional Neural Networks", pp . 1030 ? 1034 . DOI : 10.1109/ICASSP.2018.8461427 ., 2018

25 Wang Zhe, "Learnable Histogram : Statistical Context Features for Deep Neural Networks", pp . 246 ? 262, 2016

26 Changqian Yu, "BiSeNet : Bilateral Segmentation Network for Real-Time Semantic Segmentation", pp . 334 ? 349 . ISBN : 978-3-030-01261-8 ., 2018

27 S.Wang, "Localizing Microaneurysms in Fundus Images Through Singular Spectrum Analysis", pp . 990 ? 1002 . ISSN : 1558-2531 ., 2017

28 Y. Li, "Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism", pp . 2439 ? 2450 . ISSN : 1057-7149 . DOI : 10.1109/TIP.2018.2886767 ., 2019

29 Liang-Chieh Chen, "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation", pp . 833 ? 851 . ISBN : 978-3-030-01234-2 ., 2018

30 F. Lin, "Facial Expression Recognition with Data Augmentation and Compact Feature Learning", pp . 1957 ? 1961 . DOI : 10.1109/ICIP.2018.8451039 ., 2018

31 Shuangling Wang, "Hierarchical retinal blood vessel segmentation based on feature and ensemble learning", 149 (pp . 708 ? 717 . ISSN : 0925-2312 . DOI : https : //doi.org/10.1016/j.neucom.2014.07.059 ., 2015

32 J. Sahlsten, "Deep Learning Fundus Image Analysis for Diabetic Retinopathy and Macular Edema Grading", p. 10750 . ISSN : 2045-2322 . DOI : 10.1038/s41598-019-47181-w. URL : https : //doi.org/10.1038/s41598-019-47181-w ., 2019

33 Sara Moccia, "Blood vessel segmentation algorithms ? Review of methods , datasets and evaluation metrics", 158 (pp . 71 ? 91 . ISSN : 0169-2607 . DOI : https : //doi.org/10.1016/j.cmpb.2018.02.001 ., 2018

34 Zhexin Jiang, "Retinal blood vessel segmentation using fully convolutional network with transfer learning", 68 (pp . 1 ? 15 . ISSN : 0895-6111 . DOI : https : //doi.org/10.1016/j.compmedimag.2018.04.005 ., 2018

35 Q . He, "Multi-Label Classification Scheme Based on Local Regression for Retinal Vessel Segmentation", pp . 2765 ? 2769 . DOI : 10.1109/ICIP.2018.8451415 ., 2018

36 T. H. N. Le, "Reformulating Level Sets as Deep Recurrent Neural Network Approach to Semantic Segmentation", pp . 2393 ? 2407 ., 2018

37 Chensi Cao, "Deep Learning and Its Applications in BiomedicineIn : Genomics , Proteomics & Bioinformatics 16.1", pp . 17 ? 32 . ISSN : 1672-0229 ., 2018

38 L. Zhou, "Deep multiple instance learning for automatic detection of diabetic retinopathy in retinal images", pp . 563 ? 571 . ISSN : 1751-9659 ., 2018

39 M. Wu, "Weight-Adapted Convolution Neural Network for Facial Expression Recognition in Human-Robot Interaction", pp . 1 ? 12 . ISSN : 2168-2216 . DOI : 10.1109/TSMC.2019.2897330 ., 2019

40 Zhi Tian, "Decoders Matter for Semantic Segmentation : Data-Dependent Decoding Enables Flexible Feature Aggregation", pp . 3126 ? 3135, 2019

41 Marin Orsic, "In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images", pp . 12607 ? 12616, 2019

42 P. Lucey, "The Extended Cohn-Kanade Dataset ( CK+ ) : A complete dataset for action unit and emotion-specified expression", pp . 94 ? 101 . DOI : 10.1109/CVPRW . 2010.5543262 ., 2010

43 H. Zhang, "Context Encoding for Semantic SegmentationIn : 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition", pp . 7151 ? 7160, 2018

44 P. Junjun, "Diabetic Retinopathy Detection Based on Deep Convolutional Neural Networks for Localization of Discriminative Regions", pp . 46 ? 52 . DOI : 10.1109/ICVRV.2018.00016 ., 2018

45 Cam-Hao Hua, "Bimodal Learning via Trilogy of Skip-connection Deep Networks for Diabetic Retinopathy Risk Progression Identification", ISSN : 1386-5056 . DOI : https : //doi.org/10.1016/j.ijmedinf.2019.07 . 005 ., 2019

46 M. Yang, "DenseASPP for Semantic Segmentation in Street ScenesIn : 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition", pp . 3684 ? 3692, 2018

47 P. Wang, "Understanding Convolution for Semantic SegmentationIn : 2018 IEEE Winter Conference on Applications of Computer Vision ( WACV )", pp . 1451 ? 1460, 2018

48 Y . He, "Segmenting Diabetic Retinopathy Lesions in Multispectral Images Using Low- Dimensional Spatial-Spectral Matrix Representation ”", pp . 493 ? 502, 2020

49 Stela Vujosevic, "EARLY MICROVASCULAR AND NEURAL CHANGES IN PATIENTS WITH TYPE 1 AND TYPE 2 DIABETES MELLITUS WITHOUT CLINICAL SIGNS OF DIABETIC RETINOPATHY", pp . 435 ? 445 ., 2019

50 C. Yu, "Learning a Discriminative Feature Network for Semantic SegmentationIn : 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition", pp . 1857 ? 1866, 2018

51 D. Acharya, "Covariance Pooling for Facial Expression RecognitionIn : 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops ( CVPRW )", pp . 480 ? 4807 . DOI : 10.1109/CVPRW.2018.00077 ., 2018

52 T. Wu, "Tree-Structured Kronecker Convolutional Network for Semantic SegmentationIn : 2019 IEEE International Conference on Multimedia and Expo ( ICME )", pp . 940 ? 945 ., 2019

53 L. Chen, "SCA-CNN : Spatial and Channel-Wise Attention in Convolutional Networks for Image CaptioningIn : 2017 IEEE Conference on Computer Vision and Pattern Recognition ( CVPR )", pp . 6298 ? 6306, 2017

54 L. C. Chen, "DeepLab : Semantic Image Segmentation with Deep Convolutional Nets , Atrous Convolution , and Fully Connected CRFsIn : IEEE Transactions on Pattern Analysis and Machine Intelligence 40.4", pp . 834 ? 848, 2018

상세검색

RISS 보유자료

상세검색

해외전자자료

Towards Image Semantic Segmentation and Classification using Bracket-style Convolutional Neural Network and Its Variants

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료