RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 음성지원유무
        • 학위유형
        • 주제분류
          펼치기
        • 수여기관
          펼치기
        • 발행연도
          펼치기
        • 작성언어
        • 지도교수
          펼치기

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • Neural Network Verification for Nonlinear Systems

        Sidrane, Chelsea Rose ProQuest Dissertations & Theses Stanford Universit 2022 해외박사(DDOD)

        RANK : 232319

        Machine learning has proven useful in a wide variety of domains from computer vision to control of autonomous systems. However, if we want to use neural networks in safety critical systems such as vehicles and aircraft, we need reliability guarantees. We turn to formal methods to verify that neural networks do not have unexpected behavior, such as misclassifying an image after a small amount of random noise is added. Within formal methods, there is a small but growing body of work focused on neural network verification. However, most of this work only reasons about neural networks in isolation, when in reality, neural networks are often used within large, complex systems. We build on this literature to verify neural networks operating within nonlinear systems.Our first contribution is to enable the use of mixed-integer linear programming for verification of systems containing both ReLU neural networks and smooth nonlinear functions. Mixed-integer linear programming is a common tool used for verifying neural networks with ReLU activation functions, and while effective, does not natively permit the use of nonlinear functions. We introduce an algorithm to overapproximate arbitrary nonlinear functions using piecewise linear constraints. These piecewise linear constraints can be encoded into a mixed-integer linear program, allowing verification of systems containing both ReLU neural networks and nonlinear functions. We use a special kind of approximation known as overapproximation which allows us to make sound claims about the original nonlinear system when we verify the overapproximate system.The next two contributions of this thesis are to apply the overapproximation algorithm to two different neural network verification settings: verifying inverse model neural networks and verifying neural network control policies. Frequently appearing in a variety of domains from medical imaging to state estimation, inverse problems involve reconstructing an underlying state from observations. The model mapping states to observations can be nonlinear and stochastic, making the inverse problem difficult. Neural networks are ideal candidates for solving inverse problems because they are very flexible and can be trained from data. However, inverse model neural networks lack built-in accuracy guarantees. We introduce a method to solve for verified upper bounds on the error of an inverse model neural network.The next verification setting we address is verifying neural network control policies for nonlinear dynamical systems. A control policy directs a dynamical system to perform a desired task such as moving to a target location. When a dynamical system is highly nonlinear and difficult to control, traditional control approaches may become computationally intractable. In contrast, neural network control policies are fast to execute. However, neural network control policies lack the stability, safety, and convergence guarantees that are often available to more traditional control approaches. In order to assess the safety and performance of neural network control policies, we introduce a method to perform finite time reachability analysis. Reachability analysis reasons about the set of states reachable by the dynamical system over time and whether that set of states is unsafe or is guaranteed to reach a goal.The final contribution of this thesis is the release of three open source software packages implementing methods described herein. The field of formal verification for neural networks is small and the release of open source software will allow it to grow more quickly as it makes iteration upon prior work easier.Overall, this thesis contributes ideas, methods, and tools to build confidence in deep learning systems. This area will continue to grow in importance as deep learning continues to find new applications.

      • Efficient Simulation and Implementation of Neural Networks on Resource-Constrained Platforms

        Pan, Lei University of Maryland, College Park ProQuest Diss 2023 해외박사(DDOD)

        RANK : 232319

        Neural Networks have been widely adopted in signal processing applications and systems. Due to the application scenarios enabled by the portability and increasing computational capabilities associated with embedded processing platforms, such platforms are of increasing interest for deploying signal processing systems with neural networks. However, unlike high-performance computing platforms, which are suitable for computationally-intensive neural-network-equipped systems, embedded platforms are often characterized by tight resource constraints. These constraints necessitate new types of optimization and trade-off analysis in the complex design spaces associated with neural network implementation. Resource constraints also become a major concern in the simulation of spiking neural networks (SNNs) on commodity-off-the-shelf (COTS) desktop or laptop computing platforms. Such simulation capability opens up much greater access to accurate SNN simulation, which is conventionally carried out on supercomputers or specialized hardware. This thesis focuses on developing novel models and methods for efficient simulation and implementation of neural networks on resource-constrained platforms.First, we present a novel approach for simulating Spiking Neural Networks (SNNs) that is based on timed dataflow graphs. Whereas conventional SNN simulators compute changes in spiking neuron variables at each time step, the proposed simulation approach focuses on evaluating spike timings. This focus on evaluating when a dataflow actor (spiking neuron) reaches a new spike contributes to making spike evaluation an event-driven computation. The resulting event-driven simulation approach avoids unnecessary computations at time steps that lie between spiking events. This optimization is achieved while avoiding the large overheads associated with lookup tables that are incurred in existing event-driven approaches. Our results show identical spiking behavior compared to simulation using a conventional (time-based) simulator while providing significant improvement in execution time. Furthermore, the simulation of the event-driven approach is achieved on a low cost, COTS computer, whereas most SNN simulators have focused on supercomputer scale platforms or specialized hardware, as described above.Secondly, this thesis also investigates the implementation of deep neural networks (deep convolutional neural networks in particular) on resource-constrained platforms. This study is carried out in the context of hyperspectral image processing, which has attracted increasing research interest in recent years, due in part to the high spectral resolution of hyperspectral images together with the emergence of deep neural networks (DNNs) as a promising class of methods for analysis of hyperspectral images. An important challenge in realizing the full potential of hyperspectral imaging technology is the problem of deploying image analysis capabilities on resource-constrained platforms, such as unmanned aerial vehicles (UAVs) and mobile computing platforms. In this thesis, we develop a novel approach for designing DNNs for hyperspectral image processing that are targeted to resource-constrained platforms. Our approach involves optimizing the design of a single DNN for operation across a variable number of spectral bands. DNNs that are developed in this way can then be adapted dynamically based on the availability of resources and real-time performance constraints. The proposed approach supports the Dynamic Data Driven Application Systems (DDDAS) paradigm as an integrated part of the design and training process to enable dynamic-data driven adaptation of the DNN structure --- that is, the set of computational modules and connections that are active when the DNN operates. We demonstrate the effectiveness of the proposed class of adaptive and scalable DNNs through experiments using publicly available remote sensing datasets.Deep Neural Networks (DNNs) are adopted in numerous application areas of signal and information processing with Convolutional Neural Networks (CNNs) being a particularly popular class of DNNs. Many machine learning (ML) frameworks have evolved for design and training of CNN models, and similarly, a wide variety of target platforms, ranging from mobile and resource-constrained platforms to desktop and more powerful platforms, are used to deploy CNN-equipped applications. To help designers navigate the complex design spaces involved in deploying CNN models derived from ML frameworks on alternative processing platforms, retargetable methods for implementing CNN models are of increasing interest.In this thesis, we present a novel software tool, called the Lightweight-dataflow-based CNN Inference Package (LCIP), for retargetable, optimized CNN inference on different hardware platforms (e.g., x86 and ARM CPUs, and GPUs). In LCIP, source code for CNN operators (convolution, pooling, etc.) derived from ML frameworks is wrapped within dataflow actors. The resulting coarse grain dataflow models are then optimized using the retargetable LCIP runtime engine, which employs higher-level dataflow analysis and orchestration that is complementary to the intra-operator performance optimizations provided by the ML framework and the back-end development tools of the target platform. Additionally, LCIP enables heterogeneous and distributed edge inference of CNNs by offloading part of the CNN to additional devices, such as onboard GPU or network devices. Our experimental results show that LCIP provides significant improvements in inference throughput on commonly-used CNN architectures, and the improvement is consistent across desktop and resource-constrained platforms.Lastly, image classification is an essential challenge for many types of autonomous and smart systems. With advances in Convolutional Neural Networks (CNNs), the accuracy of image classification systems has been dramatically improved. However, due to the escalating complexity of state-of-the-art CNN solutions, significant challenges arise in implementing real-time image classification applications on resource-constrained platforms. The framework of elastic neural networks has been proposed to address trade-offs between classification accuracy and real-time performance by leveraging intermediate early-exits placed in deep CNNs and allowing systems to switch among multiple candidate outputs, while switching off inference layers that are not used by the selected output. In this thesis, we propose a novel approach for configuring early-exit points when converting a deep CNN into an elastic neural network. The proposed approach is designed to systematically optimize the quality and diversity of the alternative CNN operating points that are provided by the derived elastic networks. We demonstrate the utility of the proposed elastic neural network approach on the CIFAR-100 dataset.

      • Learning with Neural Networks in High Dimensions

        Misiakiewicz, Theodor Stanford University ProQuest Dissertations & These 2023 해외박사(DDOD)

        RANK : 232319

        A central challenge of modern statistics concerns the curse of dimensionality,which refers to the difficulty of learning in high dimensions due to an exponential increase in degrees of freedom. Over the past half-century, researchers have developed various methods to overcome this curse, including clever feature engineering and prior assumptions on the data structure. In recent years, however, deep learning has emerged as a standard and powerful approach to exploiting massive highdimensional datasets, vastly outperforming previous handcrafted and domain-specific methods. Despite the blackbox nature of neural networks, they appear to avoid the curse of dimensionality by automatically learning relevant features and adapting to inherent structures in the data. However, understanding precisely the mechanism behind this capability remains an outstanding challenge. For example, it is unclear what relevant features neural networks can learn efficiently, how gradient-type algorithms construct these features dynamically, or how different resources (sample size, network width, and training time) should scale for solving different tasks. Historically, neural networks' approximation and statistical properties have been investigated separately from computational aspects. This gap leaves open the study of the subset of neural networks that are of practical interest, namely those efficiently constructed by gradient-based algorithms.In this thesis, we consider three models, corresponding to two-layer neural networks trained in three different optimization regimes, and investigate their properties in high dimensions. In particular, our aim is to precisely characterize in each setting how the performance of neural networks depends on the number of samples, neurons, and gradient descent iterations used in their training.• Neural networks in the linear regime:Neural networks trained in this regime effectively behave as linear methods. Specifically, we study random feature and kernel regression, which correspond respectively to finite-width and infinite-width models for linearized neural networks. We derive sharp asymptotics for the test error in a new polynomial high-dimensional scaling, under certain abstract conditions on the underlying kernel. We show that these conditions are verified by some classical high-dimensional examples. These results allow us to precisely characterize the performance of neural networks in the linear regime, including optimal width and impact of network architecture. In addition, they explain phenomena such as multiple descents in the risk curve and benign overfitting.• Neural networks in the mean-field regime:Training in this regime is non-linear and allows neural networks to perform feature learning, i.e., to construct features that are adapted to the data. As an instructive example, we consider the problem of learning multi-index functions on Boolean or isotropic Gaussian data. Those functions only depend on a latent low-dimensional subspace and are not learnable efficiently by kernel methods. In contrast, we show that in the mean-field regime, the training sequentially learns the function support with a saddle-to-saddle dynamic. The overall time complexity now depends on the target function's leap, which measures how "hierarchical" the function is, and not on the target function's smoothness as in the linear regime. In particular, this illustrates how non-linear training of neural networks can vastly outperform kernel methods by exploiting low-dimensional and hierarchical structures in the data to construct good features.• Convex neural networks: We introduce a family of convex problems over the space of infinite-width two-layer neural networks. These can be seen as generalizations of kernel methods, whereby the RKHS norm-which can be seen as a weighted ℓ2 norm over the second layer weights-is replaced by a weighted functional ℓp norm, 1 ≤ p ≤ 2. We first show that, for p > 1, these problems can be solved in polynomial time using a random feature approximation. This implies that convex neural networks are tractable methods that break the curse of dimensionality on an increasing set of functions as p decreases to 1. On the other hand, we argue that for p= 1, which achieves near-optimal statistical rates on multi-index functions, learning becomes computationally hard under some standard hardness assumptions.The picture that emerges from this study is that neural networks can realize a rich range of trade-offs between statistical and computational complexity. In particular, neural networks trained in the linear, mean-field, and convex regimes-which can be seen as implementing three different statistical learning paradigms (fixed features, feature learning, and feature selection respectively)- suffer very differently from the curse of dimensionality. While linearized neural networks are simple computational models, they are statistically inefficient in high dimensions, except for very smooth target functions. On the other hand, convex neural networks can achieve near statistical optimal rates on multi-index functions, but do not admit general tractable algorithms. Finally, non-linear training of neural networks can be seen as a tractable middle-ground between these two extremes. In particular, it outperforms fixed feature methods by allowing efficient construction of relevant features for data with low-dimensional and favorable hierarchical structure.

      • Characterization and Optimization of Quantized Deep Neural Networks

        부윤호 서울대학교 대학원 2020 국내박사

        RANK : 232318

        Deep neural networks (DNNs) have achieved impressive performance on various machine learning tasks. However, performance improvements are usually accompanied by increased network complexity incurring vast arithmetic operations and memory accesses. In addition, the recent increase in demand for utilizing DNNs in resource-limited devices leads to a plethora of explorations in model compression and acceleration. Among them, network quantization is one of the most cost-efficient implementation methods for DNNs. Network quantization converts the precision of parameters and signals from 32-bit floating-point to 8, 4, or 2-bit fixed-point precision. The weight quantization can directly compress DNNs by reducing the representation levels of the parameters. Activation outputs can also be quantized to reduce the computational costs and working memory footprint. However, severe quantization degrades the performance of the network. Many previous studies focused on developing optimization methods for the quantization of given models without considering the effects of the quantization on DNNs. Therefore, extreme simulation is required to obtain quantization precision that maintains performance on different models or datasets. In this dissertation, we attempt to measure the per-parameter capacity of DNN models and interpret the results to obtain insights on the optimum quantization of parameters. The uniform random vectors are sampled and used for training generic forms of fully connected DNNs, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). We conduct memorization and classification tests to study the effects of the parameters’ number and precision on the performance. The model and the per-parameter capacities are assessed by measuring the mutual information between the input and the classified output. To get insight for parameter quantization when performing real tasks, the training and the test performances are compared. In addition, we analyze and demonstrate that quantization noise of weight and activation are disparate in inference. Synthesized data is designed to visualize the effects of weight and activation quantization. The results indicate that deeper models are more prone to activation quantization, while wider models improve the resiliency to both weight and activation quantization. Considering the characteristics of the quantization errors, we propose a holistic approach for the optimization of QDNNs, which contains QDNN training methods as well as quantization-friendly architecture design. Based on the observation that the activation quantization induces noised prediction, we propose the Stochastic Precision Ensemble training for QDNNs (SPEQ). The SPEQ is teacher-student learning, but the teacher and the student share the model parameters. We obtain the teacher's soft labels by changing the bit-precision of the activation stochastically at each layer of the forward-pass computation. The student model is trained with these soft labels to reduce the activation quantization noise. Instead of the KL-divergence, the cosine-distance loss is employed for the KD training. Since the teacher model changes continuously by random bit-precision assignment, it exploits the effect of stochastic ensemble KD. The SPEQ method outperforms various tasks, such as image classification, question-answering, and transfer learning without requiring cumbersome teacher networks. 최근 깊은 신경망(deep neural network, DNN)은 다양한 분야에서 매우 인상적인 성능을 보이고 있다. 그러나, 신경망의 복잡도가 함께 증가하면서, 점점 더 많은 계산 및 메모리 접근 비용이 발생하고 있다. 인공신경망의 양자화(quantization)는 깊은 신경망의 동작 비용을 줄일 수 있는 효과적인 방법 중 하나이다. 일반적으로, 신경망의 가중치(weights) 및 활성화된 신호(activation outputs)는 32 비트 부동 소수점(floating-point) 정밀도를 가진다. 고정 소수점 양자화는 이를 더 낮은 정밀도로 표현함으로써 신경망의 크기 및 연산 비용을 줄인다. 그러나, 1또는 2비트 등 매우 낮은 정밀로도 양자화된 신경망은 부동 소수점 신경망과 비교하여 큰 성능 하락을 보인다. 기존의 연구들은 양자화 에러(error)에 대한 분석 없이 주어진 데이터와 모델에 대한 최적화 방법을 제시한다. 이러한 연구 결과를 다른 모델과 데이터에 적용하기 위해서는 수많은 시뮬레이션을 수행하여 성능을 유지할 수 있는 양자화 정밀도의 한계를 찾아야 한다. 본 연구에서는 신경망에서의 양자화 특성을 분석하고, 양자화로 인한 신경망의 성능 저하 원인을 제시한다. 신경망의 양자화는 크게 가중치 양자화(weight quantization)와 활성화 함수 양자화(activation quantization)로 나뉜다. 먼저, 가중치 양자화의 특성을 분석하기 위해 무작위 훈련 샘플을 생성하고, 이 데이터로 신경망을 훈련시키면서 신경망의 암기 능력(memorization capacity)을 정량화 한다. 신경망이 자신의 암기 능력을 최대로 활용하도록 훈련시킨 뒤 성능이 하락하는 양자화 정밀도의 한계를 분석한다. 분석 결과, 가중치가 정보량을 잃기 시작하는 양자화 정밀도는 파라미터의 수와 관계가 없음을 확인하였다. 뿐만 아니라, 파라미터에 저장된 정보를 유지할 수 있는 한계 양자화 정밀도는 모델의 구조에 따라 달라진다. 또한, 본 연구에서는 활성화 함수 양자화와 가중치 양자화로 인한 에러의 차이점을 분석한다. 합성 데이터(synthesized data)를 생성하고, 이 데이터로 훈련된 모델을 양자화 한 뒤 양자화 에러를 시각화 한다. 분석 결과 가중치 양자화는 신경망의 용량(capacity)을 감소시키며, 신경망의 파라미터 수를 증가시키면 가중치 양자화 에러가 감소한다. 반면, 활성화 함수의 양자화는 추론 과정(inference)에서 잡음(noise)을 유발하며 신경망의 깊이가 깊어질 수록 활성화 함수의 에러가 증폭된다. 본 연구에서는, 두 양자화 에러의 차이를 바탕으로 양자화 친화적 아키텍처 설계와 고정 소수점 훈련 방법을 포함하는 포괄적인 고정 소수점 최적화 방법을 제안한다. 뿐만 아니라, 활성화 함수가 양자화된 신경망의 성능 복원력을 높이는 방법으로 SPEQ 훈련 방법을 제안한다. 제안하는 훈련 방법은 지식 증류 (knowledge distillation, KD) 기반 학습 방법으로, 매 훈련 단계 마다 서로 다른 선생 모델의 정보를 활용한다. 선생 모델의 파라미터는 학생 모델과 동일하며, 활성화 함수의 양자화 정밀도를 확률적으로 선택함으로써 선생 모델의 소프트 라벨(soft label)을 생성한다. 따라서 선생 모델은 학생 모델에서 유발되는 양자화 잡음을 고려한 지식을 제공해 준다. 학생 모델은 훈련 단계마다 다른 종류의 양자화 잡음을 고려한 지식으로 훈련되기 때문에 앙상블 학습(ensemble training) 효과를 얻을 수 있다. 제안하는 SPEQ 훈련 방법은 다양한 분야에서 양자화된 신경망의 성능을 크게 향상시켰다.

      • Quantization of deep neural networks for improving the generalization capability

        신성호 서울대학교 대학원 2020 국내박사

        RANK : 232317

        최근 깊은 신경망(deep neural network, DNN)은 영상, 음성 인식 및 합성 등 다양한 분야에서 좋은 성능을 보이고 있다. 하지만 대부분의 인공신경망은 많은 가중치(parameter) 수와 계산량을 요구하여 임베디드 시스템에서의 동작을 방해한다. 인공신경망은 낮은 정밀도에서도 잘 동작하는 인간의 신경세포를 모방하였기 떄문에 낮은 정밀도에서도 잘 동작할 가능성을 가지고 있다. 인공신경망의 양자화(quantization)는 이러한 특징을 이용한다. 일반적으로 깊은 신경망 고정소수점 양자화는 8-bit 이상의 단어길이에서 부동소수점과 유사한 성능을 얻을 수있지만, 그보다 낮은 1-, 2-bit에서는 성능이 떨어진다. 이러한 문제를 해결하기 위해 기존 연구들은 불균형 양자화기나 적응적 양자화 등의 더 정밀한 인공신경망 양자화 방법을 사용하였다. 본 논문은 기존의 연구와 매우 다른 방법을 제시한다. 본 연구는 고정 소수점 네트워크의 일반화능력을 향상시키는데 초점을 맞추었으며, 이를 위해 재훈련(retraining) 알고리즘에 기반하여 양자화된 인공신경망의 성능을 분석한다. 성능 분석은 레이어별 민감도 측정(layer-wise sensitivity analysis)에 기반한다. 또한 양자화 모델의 넓이와 깊이에 따른 성능도 분석한다. 분석된 결과를 바탕으로 양자화 스텝 적응 훈련법(quantization step size adaptation)과 점진적 양자화 훈련 방법(gradual quantization)을 제안한다. 양자화된 신경망 훈련시 양자화 노이즈를 적당히 조정하여 손실 평면(loss surface)상에 평평한 미니마(minima)에 도달 할 수 있는 양자화 훈련 방법 또한 제안한다. HLHLp (high-low-high-low-precision)로 명명된 훈련 방법은 양자화 정밀도를 훈련중에 높게-낮게-높게-낮게 바꾸면서 훈련한다. 훈련률(learning rate)도 양자화 스텝 사이즈를 고려하여 유동적으로 바뀐다. 제안하는 훈련방법은 일반적인 방법으로 훈련된 양자화 모델에 비해 상당히 좋은 성능을 보였다. 또한 선훈련된 선생 모델로 학생 모델을 훈련하는 지식 증류(knowledge distillation, KD) 기술을 이용하여 양자화의 성능을 높이는 방법을 제안한다. 특히 선생 모델을 선택하는 방법과 지식 증류의 하이퍼파라미터가 성능에 미치는 영향을 분석한다. 부동소수점 선생모델과 양자화 된 선생 모델을 사용하여 훈련 시킨 결과 선생 모델이 만들어내는 소프트맥스(softmax) 분포가 지식증류학습 결과에 크게 영향을 주는 것을 발견하였다. 소프트맥스 분포는 지식증류의 하이퍼파라미터들을 통해 조절될수 있으므로 지식증류 하이퍼파라미터들간의 연관관계 분석을 통해 높은 성능을 얻을 수 있었다. 또한 점진적으로 소프트 손실 함수 반영 비율을 훈련중에 줄여가는 점진적 소프트 손실 감소(gradual soft loss reducing)방법을 제안하였다. 뿐만 아니라 여러 양자화모델을 평균내어 높은 일반화 능력을 갖는 양자화 모델을 얻는 훈련 방법인 확률 양자화 가중치 평균(stochastic quantized weight averaging, SQWA) 훈련법을 제안한다. 제안하는 방법은 (1) 부동소수점 훈련, (2) 부동소수점 모델의 직접 양자화(direct quantization), (3) 재훈련(retraining)과정에서 진동 훈련율(cyclical learning rate)을 사용하여 휸련율이 진동내에서 가장 낮을 때 모델들을 저장, (4) 저장된 모델들을 평균, (5) 평균 된 모델을 낮은 훈련율로 재조정 하는 다중 단계 훈련법이다. 추가로 양자화 가중치 도메인에서 여러 양자화 모델들을 하나의 손실평면내에 동시에 나타낼 수 있는 심상(visualization) 방법을 제안한다. 제안하는 심상 방법을 통해 SQWA로 훈련된 양자화 모델은 손실평면의 가운데 부분에 있다는 것을 보였다. Deep neural networks (DNNs) achieve state-of-the-art performance for various applications such as image recognition and speech synthesis across different fields. However, their implementation in embedded systems is difficult owing to the large number of associated parameters and high computational costs. In general, DNNs operate well using low-precision parameters because they mimic the operation of human neurons; therefore, quantization of DNNs could further improve their operational performance. In many applications, word-length larger than 8 bits leads to DNN performance comparable to that of a full-precision model; however, shorter word-length such as those of 1 or 2 bits can result in significant performance degradation. To alleviate this problem, complex quantization methods implemented via asymmetric or adaptive quantizers have been employed in previous works. In contrast, in this study, we propose a different approach for quantization of DNNs. In particular, we focus on improving the generalization capability of quantized DNNs (QDNNs) instead of employing complex quantizers. To this end, first, we analyze the performance characteristics of quantized DNNs using a retraining algorithm; we employ layer-wise sensitivity analysis to investigate the quantization characteristics of each layer. In addition, we analyze the differences in QDNN performance for different quantized network sizes. Based on our analyses, two simple quantization training techniques, namely \textit{adaptive step size retraining} and \textit{gradual quantization} are proposed. Furthermore, a new training scheme for QDNNs is proposed, which is referred to as high-low-high-low-precision (HLHLp) training scheme, that allows the network to achieve flat minima on its loss surface with the aid of quantization noise. As the name suggests, the proposed training method employs high-low-high-low precision for network training in an alternating manner. Accordingly, the learning rate is also abruptly changed at each stage. Our obtained analysis results include that the proposed training technique leads to good performance improvement for QDNNs compared with previously reported fine tuning-based quantization schemes. Moreover, the knowledge distillation (KD) technique that utilizes a pre-trained teacher model for training a student network is exploited for the optimization of the QDNNs. We explore the effect of teacher network selection and investigate that of different hyperparameters on the quantization of DNNs using KD. In particular, we use several large floating-point and quantized models as teacher networks. Our experiments indicate that, for effective KD training, softmax distribution produced by a teacher network is more important than its performance. Furthermore, because softmax distribution of a teacher network can be controlled using KD hyperparameters, we analyze the interrelationship of each KD component for QDNN training. We show that even a small teacher model can achieve the same distillation performance as a larger teacher model. We also propose the gradual soft loss reducing (GSLR) technique for robust KD-based QDNN optimization, wherein the mixing ratio of hard and soft losses during training is controlled. In addition, we present a new QDNN optimization approach, namely \textit{stochastic quantized weight averaging} (SQWA), to design low-precision DNNs with good generalization capability using model averaging. The proposed approach includes (1) floating-point model training, (2) direct quantization of weights, (3) capture of multiple low-precision models during retraining with cyclical learning rate, (4) averaging of the captured models, and (5) re-quantization of the averaged model and its fine-tuning with low learning rate. Additionally, we present a loss-visualization technique for the quantized weight domain to elucidate the behavior of the proposed method. Our visualization results indicate that a QDNN optimized using our proposed approach is located near the center of the flat minimum on the loss surface.

      • 인공신경망을 이용한 문서 분류 : Text Categorization based on Artificial Neural Networks (ANN)

        리청화 전북대학교 대학원 2006 국내석사

        RANK : 232317

        Abstract Li Chenghua Department of Information and Communication Chebuk National University Text categorization is an important application of machine learning to the field of document information retrieval. This thesis described two kinds of neural networks for text categorization, multi-output perceptron learning (MOPL) and back propagation neural network (BPNN). BPNN has been widely used in classification and pattern recognition. However it has some generally acknowledged defects, usually these defects evolve from some morbidity neurons In this thesis I proposed a novel adaptive learning approach for text categorization using improved back propagation neural network. This algorithm can overcome some shortcomings in traditional back propagation neural network such as slow training speed and easy to get into local minimum. We compared the training time and performance and test the three methods on the standard Reuter-21578. The results show that the proposed algorithm is able to achieve high categorization effectiveness as measured by precision, recall and F-measure. 요약 문서분류는 정보검색에서 기계학습을 응용하는 중요한 분야이다. 본 논문에서는 다중출력 퍼셉트론 학습(Multi-Output Perceptron Learning:MOPL)과 백 프로퍼게이션 신경망(Back Propagation Neural Network:BPNN) 두 가지의 신경망 이론을 문서분류에 적용하였다. BPNN은 분류와 패턴인식에 많이 사용되고 있지만, 치명적인 신경을 포함하는 몇 가지 결점이 있다. 본 논문에서는 향상된 백 프로퍼게이션 신경망이론을 사용한 새로운 학습법을 제안할 것이다. 이 알고리즘은 기존의 백 프로퍼게이션 신경망의 느린 학습 속도와 쉽게 국소적인 제한치로 빠지는 문제를 개선할 수 있다. 로이터 자료(Reuter-21578)을 이용하여 세 가지 방법을 테스트하고, 학습시간과 성능을 비교하였다. 정확율, 재현율, 그리고 F-mesure를 통하여 본 논문에서 제안한 문서분류 알고리즘의 높은 성능을 확인할 수 있다.

      • Exploring Optimized Spiking Neural Network Architectures for Classification Tasks on Embedded Platforms

        Syed Tehreem 인하대학교 대학원 2021 국내석사

        RANK : 232316

        In recent times, the usage of modern neuromorphic hardware for brain-inspired spiking neural networks has grown exponentially. In the context of sparse input data, they are undertaking low power consumption for event-based neuromorphic hardware, specifically in the deeper layers. However, using deep artificial neural networks for training spiking models is still considered as a tedious task. Until recently, various ANN to SNN conversion methods in the literature have been proposed to train deep SNN models. Nevertheless, these methods require hundreds to thousands of time-steps for training and still cannot attain good SNN performance. This work proposes a customized model (VGG, ResNet) architecture to train deep convolutional spiking neural networks. In this current study, the training is carried out using deep convolutional spiking neural networks with surrogate gradient descent backpropagation in a customized layer architecture similar to deep artificial neural networks. Moreover, this work also proposes fewer time-steps for training SNNs with surrogate gradient descent. During the training with surrogate gradient descent backpropagation, overfitting problems have been encountered. To overcome these problems, this work refines the SNN based dropout technique with surrogate gradient descent. The proposed customized SNN models achieve good classification results on both private and public datasets. In this work, several experiments have been carried out on an embedded platform (NVIDIA JETSON TX2 board), where the deployment of customized SNN models has been extensively conducted. Performance validations have been carried out in terms of processing time and inference accuracy between PC and embedded platform, showing the proposed customized models and training techniques are feasible to achieve better performance on various datasets such as CIFAR-10, MNIST, SVHN, and private KITTI and Korean License plate dataset.

      • Pattern Classification and Clustering Algorithms with Supervised and Unsupervised Neural Networks in Financial Applications

        이기동 Kent State University 2001 해외박사

        RANK : 232316

        Due to the development of network technologies, business. information today is more easily accessed, captured, and transferred over an information highway. This transformation process of business information requires quick and accurate interpretation of information, and to facilitate business decision making processes, decision support systems in the emerging market should support accurate, flexible, and timely characteristics of information to users. This dissertation focuses on the accuracy dimension in key financial applications, with use of artificial neural networks (ANNs). Artificial neural network models are often classified into two distinctive training types, supervised or unsupervised. Previous pattern classification researchers in business have mostly used back-propagation (BP) networks. In this dissertation, the BP network (supervised) and the Kohonen self-organizing feature map (unsupervised) are together examined for their effectiveness and desirability in financial classification tasks. Bankruptcy prediction (two-group) and bond-rating (multi-group) are selected as testbeds. Statistical classification techniques, logistic regression and discriminant analysis, are also provided as performance benchmarks for neural network classifiers. The findings of this study first confirmed that the back-propagation (BP) network outperformed all the other classification techniques used in this study. In addition, the study showed that as training sample size increased, a more complex BP modal might be applied, and as a result, the performance of the BP network would improve accordingly. Second, Lowe and Webb's (1991) reciprocally weighted target coding scheme was empirically tested with two other target coding & threshold schemes. The Lowe and Webb scheme did not seem to work well. Third, the study identified a few key conditions for using the Kohonen self-organizing feature map in pattern classification settings. Provided that these key conditions were met, the Kohonen self-organizing feature map may be used as an alternative for pattern classification task.

      • Implementting Adaptive Fuzzy Logic Controllers with Neural Networks

        김흥만 Univ. of Arizona 1995 해외박사

        RANK : 232316

        The goal of intelligent control is to achieve control objectives for complex systems where it is impossible or infusible to develop a mathematical system model but expert skills and heuristic knowledge from human experiences are available for control purposes. To this end, as intelligent control system must have the essential characteristics of human control experiences, i.e.., linguistic knowledge representation, which facilitates the process of knowledge acquisition and transfer, and adaptive knowledge evolution or learning, which leads to the improvement is system performance and knowledge. This dissertation presents as efficient approach that combines fuzzy logic and neural networks to capture these two important features required for as intelligent control system. A design method for adaptive neuro-fuzzy controllers has been proposed using structured neuro-fuzzy networks. The structured neuro-fuzzy networks consist of three types of subsets for pattern recognition, fuzzy reasoning, and control synthesis, respectively. Each subset is constructed directly from the decision-making procedure of fuzzy logic based control systems. Is this way, a one-to-one mapping between a fuzzy logic based control system and a structured neuro-fuzzy network is established. This mapping enable us to create a knowledge structure within neural networks based on fuzzy logic, and to give a learning ability to fuzzy controls using neural networks. From the perspective of neural networks, the proposed design method offers a mechanism to: construct networks with heuristic knowledge, instead of using digital training pairs, which are much more difficult to get, build decision structures into networks, which divide a network into several functional regions and make the network no longer just as a black-box function approximates, and conduct network learning in a distributed fashion, i.e.., each sub-network of different functional regions can learn its own function independently. On the other hand, from the perspective of fuzzy logic, the proposed design method provides a tool to: refine membership functions; inference procedures, and defuzzification algorithms of fuzzy control systems; generate new fuzzy control rules so that fuzzy control systems can adapt to gradual changes in environments and implement parallel execution of rule matching, firing, and defuzzification. Several simulation studies have been conducted to demonstrate the use of the structured neuro-fuzzy networks. The effectiveness of the proposed design method has been clearly shown by the results of these studies. These results have also indicated that fuzzy logic and neural networks are complementary and their combination is ideal to achieve the goal of intelligent control.

      • 패턴인식을 위한 Polynomial-based RBF Neural Networks와 Type-2 Fuzzy Neural Networks 설계

        김길성 수원대학교 일반대학원 2009 국내석사

        RANK : 232316

        본 연구에서는 고차원 데이터에 대한 패턴인식 문제, Small-size training dataset problem을 가지는 패턴인식 문제, 노이즈 섞인 데이터의 패턴인식 문제 등의 다양한 패턴 인식 시스템에 적용될 수 있는 Polynomial-based Radial Basis Function Neural Networks(P-RBF NNs)와 Type-2 Fuzzy Neural Networks(T2 FNNs)을 제안한다. 본 연구의 목적은 위의 2가지 형태의 패턴 분류기를 이용하여 패턴 인식에서 나타날 수 있는 다양한 특성을 가진 데이터의 패턴인식에 대한 알맞은 해결책을 제시하는 것이다. 첫 번째로 제안된 P-RBF NNs의 구조는 FCM 클러스터링에 기반 한 분할 함수를 활성 함수로 사용하며, 다항식 함수로 구성된 연결가중치를 사용함으로서 기존 신경회로망 분류기의 선형적인 특성을 개선한다. P-RBF NNs은 조건부, 결론부, 추론부 세 가지의 기능적 모듈로 나뉘어 네트워크 구조가 형성된다. 조건부는 FCM 클러스터링을 사용하여 입력 공간을 분할하고, 결론부는 분할된 로컬 영역을 다항식 함수로 표현한다. 마지막으로, 네트워크의 최종출력은 추론부의 퍼지추론에 의한다. 제안된 P-RBF NNs는 다항식 기반 구조의 퍼지 추론 특성으로 인해 출력 공간상에 비선형 판별 함수(nonlinear discernment function)가 생성되어 분류기로서의 성능을 높인다. 두 번째로 제안한 모델인 Type-2 퍼지 이론을 이용한 T2 FNNs은 Input layer, Fuzzification layer, Inference layer, Defuzzification layer의 4개의 층으로 구성된다. T2 FNNs은 Type-2 FCM 클러스터링을 통하여 생성된 Interval Type-2 Fuzzy Set을 Fuzzification layer의 활성 함수로 사용하고 연결가중치에 퍼지 규칙 후반부의 Type-2 퍼지 집합의 중심을 의미하는 Interval Type-1 퍼지 집합을 사용하여 노이즈에 강인한 특성을 갖는다. 제안된 두 가지 네트워크의 구조는 언어적 해석관점에서 "If-then"의 퍼지 규칙으로 표현되며 퍼지 추론 메커니즘에 의해 구동된다. 또한 Particle Swarm Optimization 알고리즘을 이용하여 제안된 두 가지의 분류기들의 파라미터, 즉, 학습률, 모멘텀 계수, FCM 클러스터링의 퍼지화 계수(fuzzification Coefficient)를 최적화한다. 제안된 네트워크들의 성능을 평가하기위하여 모의 데이터 집합, 기계 학습 데이터 집합은 P-RBF NNs를, 부분방전 데이터 집합은 T2 FNNs를, 얼굴 영상 데이터 집합에는 Feature selection기능이 추가된 L(Linear)-RBF NNs을 각각 사용한다. 이와 같이 본 연구에서는 기본적인 메커니즘은 같지만 세부적인 기능의 추가 및 제거를 통하여 주어진 데이터의 특성에 최적화된 분류기를 구현한다. In this study, we propose Polynomial-based Radial Basis Function Neural Networks (P-RBF NNs) and Type-2 Fuzzy Neural Networks(T2 FNNs). These networks can be used for high-dimensional pattern recognition problem, noise-corrupted dataset one and pattern recognition having small-size training dataset problem respectively. The first proposed P-RBF NNs use structural findings about training data expressed in terms of partition matrix resulting from fuzzy clustering that is Fuzzy C-Means (FCM). The network is of functional nature as the weights between the hidden layer and the output are treated as some polynomials. The use of the polynomial weights becomes essential in reflecting the nonlinear nature of data encountered in regression or classification problems. The architecture of the networks discussed here embraces three functional modules reflecting the three phases of input-output mapping realized in rule-based architectures, namely condition formation, conclusion creation and aggregation. Secondly, We proposed Type-2 Fuzzy Neural Networks(T2 FNNs). T2 FNNs use Type-2 fuzzy theory being recently developed Mendel and Karnik completely. It has four layers such as input layer, fuzzification layer, inference layer and defuzzification layer. In the fuzzification layer, interval Type-2 fuzzy sets are created by Type-2 fuzzy clustering as an activation function. And then, the units in the inference layer creates interval Type-1 fuzzy set through type reduction of interval Type-2 fuzzy sets. And the defuzziffication layer generates a crisp output. From the perspective of linguistic interpretation, These networks can be expressed as a collection of “if-then” fuzzy rules. The essential design parameters (including learning rate, momentum coefficient and fuzzification coefficient) of the networks are optimized by means of the Particle Swarm Optimization. The proposed P-RBF NNs and T2 FNNs are applied to some synthetic datasets, machine learning datasets, partial discharge dataset and face image datasets. And its results are compared with previous studies.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼