http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
기후변화 시나리오를 활용한 공간정보 기반 극단적 기후사상 분석 도구(EEAT) 개발
한국진,이명진 대한원격탐사학회 2020 대한원격탐사학회지 Vol.36 No.3
기후변화 시나리오는 기후변화 대응 연구의 기반이 되는 사항으로, 대용량 시공간 데이터로 구성되어있다. 데이터의 관점에서는 1종의 시나리오가 약 83 기가바이트(Giga bytes) 이상의 대용량이며, 데이터 형식은 반정형으로 검색, 추출, 저장 및 분석 등 활용상 제약이 있다. 본 연구에서는 대용량, 다중시기 기후변화 시나리오의 활용을 편리하게 개선하기 위하여 공간정보 기반의 극단적 기후사상 분석 도구를 개발하였다. 또한, 개발된 도구를 RCP8.5 기후변화 시나리오에 적용하여 과거 발생한 집중호우 임계치가 미래 발생 가능한 시기와 공간에 대한 시범 분석을 수행하였다. 분석결과, 3일 누적 강우량 587.6 mm 이상인 날이 2080년대 약 76회 발생하는 것으로 분석되었으며, 집중호우는 국지적으로 발생하였다. 개발된 분석도구는 초기 설정부터 분석결과를도출하는 전 과정이 단일 플랫폼에서 구현되도록 하였다. 더불어 상용 소프트웨어가 없어도 분석결과를 다양한 형식(웹 문서형식(HTML), 이미지(PNG), 기후변화 시나리오(ESR), 통계(XLS))으로 구현되도록 하였다. 따라서 본 분석도구 활용을 통해 기후변화에 대한 미래 전망이나 취약성 평가 등의 활용에 도움이 될 것으로 사료되며, 향후 제공될 기후변화 보고서에 따른 기후변화 시나리오 분석 도구 개발에도 사용될 것으로 기대된다. Climate change scenarios are the basis of research to cope with climate change, and consist of large-scale spatio-temporal data. From the data point of view, one scenario has a large capacity of about 83 gigabytes or more, and the data format is semi-structured, making it difficult to utilize the data through means such as search, extraction, archiving and analysis. In this study, a tool for analyzing extreme climate events based on spatial information is developed to improve the usability of large-scale, multi-period climate change scenarios. In addition, a pilot analysis is conducted on the time and space in which the heavy rain thresholds that occurred in the past can occur in the future, by applying the developed tool to the RCP8.5 climate change scenario. As a result, the days with a cumulative rainfall of more than 587.6 mm over three days would account for about 76 days in the 2080s, and localized heavy rains would occur. The developed analysis tool was designed to facilitate the entire process from the initial setting through to deriving analysis results on a single platform, and enabled the results of the analysis to be implemented in various formats without using specific commercial software: web document format (HTML), image (PNG), climate change scenario (ESR), statistics (XLS). Therefore, the utilization of this analysis tool is considered to be useful for determining future prospects for climate change or vulnerability assessment, etc., and it is expected to be used to develop an analysis tool for climate change scenarios based on climate change reports to be presented in the future.
환경 분야 빅데이터 수집방법 연구 : 대기질 데이터를 중심으로
한국진 ( Kj Han ),강성원,김도연,김영인 한국환경연구원 2017 한국환경정책평가연구원 기초연구보고서 Vol.2017 No.-
The purpose of this study is identify the big data that can be used for environmental research through understanding the big data which is the basis of intelligent information society and to develop a procedure and framework of environment big data. In order to using the big data as a center of future and research paradigm, it is necessary to understand and actively apply the big data. In addition, identification and countermeasures for environmental data should be prepared. As a case study, it analyzed the air quality data and services of Airkorea, the process of scraping and storing the big data through service analytic process and presented a framework for scraping method.
박종철,한국진,채여라 한국지리학회 2019 한국지리학회지 Vol.8 No.3
소셜 빅데이터는 재난을 조기 탐지하는 정보의 원천이 될 수 있으며, 재난의 공간적 분포를 이해하기 위한 새로운 가능성을 내포하고 있다. 이를 위해서는 뉴스 빅데이터로부터 수집된 정보와 실제 사건의 관계에 대한 이해가 선행되어야 한다. 본 연구의 목적은 뉴스 빅데이터의 분석 결과와 폭염에 의한 가축 폐사와의 비교를 통해 두 자료의 관계에 대한 이해를 증진시키는 것이다. 가축 폐사가 증가하는 기온 구간에서 축산 피해 관련 뉴스는 다른 시기에 비해 두 배 이상 증가하였다. 하지만 뉴스 건수의 정점은 가축 폐사의 정점으로부터 약 6일 후에 나타나고 있었다. 가축 폐사가 증가하는 기온 구간에서 뉴스의 주요어는 ‘폐사’이었다. 7월 중순 이전의 뉴스에서 주요어는 ‘대응’, ‘예방’이었고, 7월 중순부터 8월 중순에는 ‘폐사’, 8월 중순 이후에는 ‘물가’가 주요어를 이루고 있었다. 사회적 이슈에 의해 특정 주요어의 빈도가 높아지기도 하지만 ‘폐사’라는 주요어는 대체로 실제 폐사가 집중되는 기온 구간 및 시기에 등장하고 있었다. Social big data can be a source of information for early detection of disasters. Furthermore, it contains new possibilities for understanding the spatial distribution of disasters. Understanding the relationship between information obtained from news big data and actual events is essential to do this. The purpose of this study is to improve the understanding of the relationship between the two data by comparing the results of the analysis of news big data and livestock mortality caused by heat waves. The number of news was doubled during the period livestock mortality increased. However, the number of news reached the peak after six days the livestock mortality reached the peak. In the temperature range where livestock mortality increased, the main keyword of the news was ’mortality’. In the news before mid-July, the main keywords were ‘response’ and ‘prevention’, and the main keyword was ‘mortality’ from mid-July to mid-August. Since mid-August, ‘price’ was the main keyword. Although the frequency of some key words is affected by social issues, the key word 'mortality' appeared mostly in temperature ranges and periods of actual mortality.
생활밀착형 환경이슈에 대한 수요반영 개선 연구 민원 빅데이터 분석을 중심으로
진대용,강성원,한국진,김진형,김도연,강선아 한국환경연구원 2019 수시연구보고서 Vol.2019 No.-
본 연구는 빅데이터 분석을 통한 생활밀착형 환경 이슈의 수요반영 개선 방안에 대한 연구이다. 시민들의 환경문제에 대한 인식이 높아지면서 미세먼지, 폐기물/쓰레기, 소음, 악취 등 다양한 환경문제가 이슈로 떠오르고 있다. 하지만 시민들이 실제로 해결을 요구하는 환경문제와 환경정책의 대응 사이에는 괴리가 존재한다. 이에 본 연구에서는 이런 시민들의 일상생활과 밀접한 관련이 있는 ‘민원’에서 발생하는 모든 환경 문제를 ‘생활밀착형 환경이슈’로 정의하고 이에 대한 수요를 반영할 수 있는 방안을 제시하였다. 먼저 환경부 유사민원(국민신문고 공개민원) 분석을 통해 민원에서 나타나는 전반적인 환경이슈들을 분석하였다. LDA 토픽 모델링을 수행하여 ‘생활환경’, ‘건설 및 가축 폐기물’, ‘환경영향평가’, ‘유해화학물질’, ‘대기오염물질 및 배출시설’, ‘폐수’, ‘의료 및 사업장 폐기물’ 7개의 토픽으로 구성하였는데, 전체적으로 볼 때 소음, 쓰레기, 미세먼지 등을 포함하는 ‘생활환경’ 이슈와 관련한 민원이 상대적으로 증가 추세가 있었다. ‘생활환경’ 에서는 2015년까지는 ‘공사소음, ‘층간소음’, ‘교통소음’, ‘공장소음’ 등의 다양한 소음 문제의 해결을 요구하는 민원이 대다수 였지만, 2016년 이후는 미세먼지 이슈가 등장하면서 미세먼지가 가장 높은 빈도수를 보였다. 특히 ‘미세먼지’의 경우 ‘아이’들의 건강에 대한 우려와 더불어 관련 대책을 요구하는 민원이 많았다. ‘건설 및 가축 폐기물’ 및 ‘의료 및 사업장 폐기물’ 에서는 처리, 분리수거, 재활용 등에 관련된 내용이 많았으며, 특히 부가가치가 높은 건축 폐기물에 대한 ‘순환 골재’ 등에 대한 인식 개선이 필요한 것으로 나타났다. ‘환경영향평가’에서는 2018년 ‘소규모 환경영향평가’의 수요가 급격하게 증가하였으며, ‘폐수’에서는 폐수(배출시설), 수질 등과 관련된 민원이 꾸준히 나타나고 있었고, 각종 축산폐수 등으로 인한 ‘지하수’에 대한 내용이 증가하는 추세를 보였다. ‘유해화학물질’ 에서는 설치검사, 안전검사, 설치검사, 영업허가, 취급시설, 신고대상 등에 내용이 많았고, 대기오염물질 및 배출시설’에서는 대기배출시설, 배출허용기준, 방지시설, 자가측정, 악취배출 허용 및 해당 여부 등의 내용이 많았다. 세종특별자치시에서는 ‘소음’, ‘악취’와 관련한 민원이 많았다. 신도시의 특성상 각종 주거시설 및 상업시설의 소음과 먼지로 인한 민원이 다수 발생한 것으로 보인다. 따라서 소음의 원인을 추적하고 적시에 대응하는 동시에, 도로변에서 발생하는 소음을 막기 위한 방음벽 설치 등의 정책적 지원이 필요할 것으로 사료된다. 악취 문제에 대한 대책도 필요하다. 비료, 쓰레기악취, 축사악취 등으로 인한 악취가 다수 발생하고 있으므로 이에 대한 대처가 필요하다. 또한 단지 내, 아파트, 상가, 그리고 특히 버스정류장 등에서 자주 발생하는 쓰레기 문제에 대응하기 위한 정책 및 전기차 충전소 설치, 및 보조금 지급에 관련한 문제에 대해서도 보다 건설적인 대책이 필요해 보인다. 환경정책의 최종 수요자는 국민이므로, 이들이 해결을 요구하는 환경이슈를 다양한 경로로 파악하는 것이 중요하다. 민원은 환경 텍스트 중 시민들의 실제 생활과 관련성이 높은 환경문제의 시각을 반영하고 있어 좋은 정책수립의 근거를 찾을 수 있을 것으로 판단된다. 현재 시민 대다수가 많은 환경문제 에서도 미세먼지 이슈에 촉각을 곤두세우고 있다. 그런 한편으로 실제 민원에서는 이 외에도 공사소음, 쓰레기, 악취 등의 문제에 대한 해결을 요구하는 비중이 높은 것으로 나타나 이에 대한 적극적인 대응이 필요한 실정이다. 미세먼지는 단기간에 해결할 수 있는 문제가 아니며 국내의 문제해결과 더불어 국제적인 협력을 필요로 한다. 반면에 소음, 쓰레기, 악취 등은 충분한 논의를 통해 규제, 피해보상, 단속강화 등이 이루어진다면 그 피해를 줄일 수 있을 것으로 보인다.
사회ㆍ환경이슈 선제적 대응을 위한 환경 데이터 허브 구축 및 운영
진대용,표종철,한국진,김도연,조윤랑 한국환경연구원 2021 사업보고서 Vol.2021 No.-
Ⅰ. Background and Aims of Research 1. Heading □ Construction of ‘data dam’, a key element of the great social and economic transformation ○ A data hub is required for data collection and utilization - Public and private data are the key drivers of the future industry - It is necessary to create new values for ‘data dam’ such as data maps, data linkage and analysis services. ※ Data Dam: Collecting data, standardizing it, and sharing it againn ○ Difficulty in using data to respond to large-scale social and environmental issues - Large-scale social and environmental issues such as COVID-19, fine dust, and humidifier disinfectant occurred - It is difficult to collect and utilize environment-related data to respond to social and environmental issues. □ Present a mid- to long-term roadmap for building a data hub to respond to social and environmental issues ○ Prepare a plan to build a data hub for the digital transformation of environmental policy research - Derivation of essential elements for building an environmental data hub through a review of major implementation cases - Based on Institutional Data Repository (IDR), build a storage-centric data hub pilot ○ Present a mid- to long-term roadmap for building an efficient data hub - Discovering data to respond to various social and environmental issues and support data-based decision-making - Presenting a mid- to long-term roadmap considering scattered data and utilization of various data analysis platforms 2. Research Scope and Methods □ (Pilot) After implementation, present a mid-to-long-term roadmap for future improvement ○ Deriving essential data hub functions through data hub implementation case review - Major functions : data and analysis service, data map, and user accessibility improvement ○ Functional definition of data hub to respond to social/environmental analysis issues - Accumulation of data-based social and environmental issue analysis cases and review of strengths and limitations of data analysis ○ Proposal of mid- to long-term roadmap for future improvement after pilot implementation of environmental data hub - Proposal of mid- to long-term roadmap after pilot implementation of environmental data hub based on IDR system Ⅱ. Strategies to Build an Environmental Data Hub 1. Overview of building a data hub □ Applicable data hubs in the environmental field need to be reviewed ○ Poor data analysis platform and data hub - UK: Support for data-based social problem solving and administrative data analysis research use - Singapore: Pan-government platform operation for national issue analysis - U.S.: Establishment and utilization of smart city data hub based on cyber physical system (CPS) - Korea: Establishment of collection-storage data base by the Ministry of Environment, and restriction of connection and use 2. Key Data Hub Examples □ Public Data Portal ○ Installed and operated according to the Public Data Act as the largest data hub in Korea - About 40,000 file data, 7,000 open data, and 10,000 standard data ○ Provides a national data map from various perspectives ○ Provide visualization services such as public participation map, location information visualization and so on □ National Statistics Portal ○ As the largest statistical data hub in Korea, domestic and foreign statistics are provided in accordance with the Statistical Act ○ Provide visualizations such as data maps from various viewpoints and e-local indicators (visualization) ○ Provide professional services such as micro data integration service □ Big data common-based insight portal ○ Pan-government big data analysis platform service ○ SNS text mining analysis and visualization provided, generally slow ○ Provide joint use data registration management system □ Environmental information convergence big data platform (environmental data portal) ○ Specialized data collection-storage portal in the environmental field ○ Provides 4 types of data analysis platform services, but it is slow and inconvenient ○ Next-generation upgrade planned after 2022 □ Environmental Business Big Data Platform ○ Environment field data distribution platform ○ Provide various text mining visualization results and environmental data visualization examples ○ A total of 17 public and private sectors participated □ Research data repository ○ A system for sharing research data - Core components of Open Science: Research data ㆍ NASA provides satellite data ㆍ CERN provides experimental data for the International Large Hadron Collider ㆍ Genomic data sharing service in the bio field ㆍ Nature, Springer, and Elsevier in publishing ○ The rise of the concept of open science to open and share research results and exaggerations ㆍ OECD: 13 principles including openness, effectiveness, and sustainability ㆍ ISC: makes 14 recommendations to promote universal and equal access to public data; ㆍ U.S.: Implementation of digital data management and collection by federal agencies at the national level, implementation of data management and sharing policies centered on national research institutes, and operating programs for infrastructure and data sharing ㆍ Europe: Establishment of OpenAIRE, an entire European network with national repositories, management of research results of investment projects, management of publications and literature ○ Overseas research data platform operation: Europe, USA, UK, Japan, Australia, etc. 3. Key Features of Data Hub □ Data Map ○ Utilize to effectively use vast amounts of data ○ Provide various viewpoints by classification, region, keyword, and field ○ In the environmental field, a multi-view classification system is required according to the keyword access order □ Data standardization ○ It means processing the data so that anyone can use it easily. ○ International standardization is promoted in consideration of the vertical and horizontal interoperability of big data ○ Domestic standardization is being applied only to some elements for big data processing □ Big data analysis and utilization system ○ Refers to a system for checking, analyzing, and visualizing data in connection with the data map ○ Support for functions similar to data analysis platform services □ Support for public data and data-based administration work ○ Recently, data-related laws have increased and related plans and evaluation responses have increased ○ DMP-Research data registration makes it possible to discover data, understand the current status, and prove performance ○ However, necessary to connect environmental data hubs and intranet information systemsm. Ⅲ. Analysis of COVID-19 Issues Centered on Environmental Data Hub 1. Data Status Review □ Although environmental statistics data is highly reliable, it takes a lot of time to calculate statistics, and there are temporal and spatial limitations □ Credit card data provides consumption big data for analysis of card usage status by industry by sector and social and environmental issues such as COVID-19 and fine dust. ○ Securing and analyzing data on BC card consumption related to COVID-19 through ‘data voucher business’ in ’20~’21 □ Possible to collect and use text data such as SNS and press releases for deriving and analyzing social and environmental issues. ○ Deriving environmental issues* that emerged after the COVID-19 crisis through text mining analysis * Environmental issues: 1) Increase in garbage (waste, etc.), 2) Decrease in air pollution (air quality), 3) Increase in energy (electricity, gas, etc.) 2. Analysis of environmental issues caused by near real-time due to COVID-19 □ Possible to develop timely policies to respond to environmental issues that occur in (quasi) real-time by convergence analysis of card data and environmental data for environmental issues that have emerged due to COVID-19 ○ Analysis of possible environmental issues (increase in waste, decrease in air pollution, increase in energy consumption) through card data-based consumption pattern change analysis ○ As a result of the analysis, when the number of confirmed COVID-19 cases increases, both the amount and number of delivery apps use increases, and the amount and number of use of public transportation and gas both decrease. It is considered that this is due to the high 3. Analysis of before and after COVID-19 social distancing policy □ Analyze the effect of government intervention by analyzing the changes in COVID-19 confirmed cases and card use before and after the social distancing policy after the COVID-19 inciden ○ Comparative analysis of data before and after 4 weeks (1 month) based on the social distancing period - 4 sections according to the social distancing stage (‘20.3.22~`20.4.19, `20.8.30~`20.9.13, `20.9.14~`20.10.11, `20.12.8~`20.12.28) ○ Confirm the existence of differences before and after the policy by analyzing the average change of the variables used in the calculation of the increase or decrease of the number of COVID-19 confirmed cases ○ Verification of the trend before and after the policy and comparison analysis based on the verified trend confirms that there is a trend change in all 4 sections 4. Additional Requirements for Environment Data Hub □ Detection of social/environmental issues and provide current status analysis ○ Need to automate data collection of documents, press, press releases and portals ○ Relevant and related issue analysis and procedures required for early detection of social and environmental issues □ Securing data for analysis of social/environmental issues and building a base for sharing ○ Need functions to efficiently provide public and private data ○ Review the scope of data for analyzing social and environmental issues, provide data, and establish examples of analysis □ Review of the nature and scope of the data ○ Data is utilized in consideration of circumstances such as reliability of data and prompt response to issues ○ Used as data for common use by reviewing the aspect of common use of data. ○ Research data was selected in consideration of data accessibility and Sustainability □ Review of the use of analysis tools to analyze social and environmental issues ○ Not all research data is used as analysis data ○ Necessary to discover analysis tools and use cases to analyze social and environmental issues □ Establishment of data-based policy decision support system that can draw policy implications ○ Since big data is analyzed through simplification with implications, additional procedures for decision-making such as expert interpretation and policymaking are absolutely necessary ○ Essential to establish a data-based policy decision support system Ⅳ. Implementation of a Pilot Environment Data Hub 1. Essentials of Building an Environmental Data Hub □ Data set ○ Demand for measures to secure quality data - Data demand survey that can be used for environmental policy - Automate data collection by collection path - Discover data networks such as participation in the data working group of the Ministry of Environment - Participation in competition for data set construction and data support projects - Improving researcher access and promoting work efficiency, etc. □ Data Repository ○ Demand for a method that can simultaneously maintain the convenience and integrity of meta information operation and management - Data submission, update, search function and metadata management function are required - Utilization of DMP, authority management, connection of external data and data analysis platform □ Data analysis platform ○ Need to build a data pipeline for data analysis - Data loading, pre-processing, analysis, verification, and visualization should be possible - Consider the convenience of using codes such as programming languages and libraries - Data linkage with data storage, flexible storage of data analysis results - User convenience of major AI and data analysis modules such as numerical prediction and text/image analysis 2. Building an Environmental Data Hub □ Preliminary considerations ○ Research data collection - Provide efficient inquiry and search results: whether original data, source, location of data, etc. - The joint use data and the year of the assignment are reflected in the top-level collection ㆍ Shared data: climate change, green transition, atmospheric environment, water management, land environment, resource circulation, environmental health, environmental impact assessment, index statistics, other (external), etc. ㆍ The collection by year of task execution contains collections by task type, and the task name collection exists under it ※ Collection: Cabinet containing research data and metadata of research data Research data categorization system ○ data citation - Creating a virtuous cycle ecosystem of data utilization through efficient research ㆍ Recognition of merits of previous researchers ㆍ Subsequent researchers can reproduce and utilize the research process and results ㆍ Contribute to the spread of research results through reuse of research results ㆍ Enhance the trust and transparency of research results among researchers - All 4 types of quotation marks including KEI format, MLA, APA, ISO 690 - DOI publishing function provided ○ Data map - Efficient data search ㆍ Users who do not have clear knowledge of the data they want to use can also use it ※ Integrated data map: Provides approaches by classification, region, keyword, and field ※ Public data portal: Treemap and search function are provided concurrently, and it is advantageous to understand the weight of data ○ Data management procedure - Systematic research data collection and storage possible through data construction and data management ㆍ Data construction: data classification and data standardization through data verification and review ㆍ Data management: Classify priorities into important data and general data, and perform data quality management, data disclosure decision, data supplementation, and life cycle management ㆍ Step-by-step life cycle management is required according to DMP-research data synchronization and planning-execution-completion ○ Building a framework - The KEI-IDR system is used as a research data repository and DMP-research data is used - Research DB uses intranet system and research information is linked - Big data analysis platform utilizes KEI big data analysis platform pilot service - External hubs are linked to suit the purpose of data, analysis, infrastructure, etc. - External data is linked according to the purpose of public data portal, national statistics portal, AI data hub, Big Kinds, etc. ○ Pilot build - Pilot implementation of an environmental data hub based on preliminary reviews, data management procedures, and ㆍ Build dynamic data capabilities to collect automatically updated data ㆍ Establishment of data sharing function among users and retention period function for data protection ㆍ Build external academic DB search function, data map, and external data function ㆍ Replace with physical storage NAS ○ External data utilization - Separation of data collection for common use: data frequently used for research, data with universal classification criteria ㆍ Data can be used remotely through OpenAPI, WebDAV, FTP, etc. - Data portal and data analysis platform ㆍ Use of environmental big data analysis platform pilot service, environmental data science conversion research service and personal analysis environment ㆍ When the use of data is more important, it is advantageous to use an external data analysis platform ㆍ MLOps: Used by organizations moving their analytics environment online ○ Environmental data hub upgrade plan - Improvement of DMP management function: copy template, change order, export to Excel, etc. - Improvement of personal storage function: upload/download, sharing, use of OpenAPI, interworking with programming code, etc. 3. Roadmap for expanding the environmental data hub □ Presenting a roadmap for the KEI-type environmental data hub ○ Presenting a KEI-type environmental data hub roadmap (simplification) in consideration of constraints - Constraints ㆍ Impossible to build an environmental data hub considering the characteristics of all research data. ㆍ Not practical to apply the general information system construction methodology ㆍ Consider changes in task execution period, budget, manpower, and social/environment ㆍ Step by step expansion of consumers such as researchers, policy makers, demanding companies and the general public - Proposals ㆍ Establishment of environmental data hub construction plan: Implemented for 8 months from the time the latest update of the 2021 standard IDR is completed ㆍ Establishment of environmental data hub infrastructure: Considering the linkage between the KEI-IDR system and other systems such as external analysis platform services and external data portals, and reflecting the flexible classification system ㆍ Environmental data hub upgrade: reflect external service changes, reflect results after demand survey, expand data map ○ Roadmap (simplification) Presenting a roadmap for expanding the environmental data hub in consideration of constraints - Data construction ㆍ Stage 1 (2020~2021): Research data registration and internal public pilot operation, environmental data platform status identification and analysis, and external data interlocking function establishment ㆍ Stage 2 (2022~2024): Expand research data registration projects to all government subsidy projects, prepare procedures for external disclosure of research data, and build AI data based on the results of environmental expert demand surveys ㆍ Stage 3 (from 2025): Expand research data registration target projects to consignment projects, expand research data disclosure target - Construction of data repository ㆍ Stage 1 (2020~2021): Introduction of standard IDR and establishment of KEI-IDR, interworking of intranet information system, establishment of basic data statistics, data map and external data search function ㆍ Stage 2 (2022~2024): stabilization of KEI-IDR, expansion of data linkage and utilization functions ㆍ Stage 3 (from 2025): Completion of data storage construction, advancement of data archiving service - Introduction of data analysis platform ㆍ Stage 1 (2020~2021): No phase 1 due to the use of the existing analysis platform service, server, and personal analysis environment ㆍ Stage 2 (2022~2024): Function improvement to directly connect research data in the analysis environment and establishment of an expert-oriented dashboard ㆍ Stage 3 (from 2025): Provide data convergence use cases and upgrade dashboard - Success conditions: Operation of a dedicated organization > Securing a budget and improving the system ㆍ Data policy improvement: information security policy improvement to enable safe and flexible access ㆍ Dedicated organization: Establishment of a dedicated organization in accordance with data-related laws, self-supply of data scientists and technicians (using professional training, etc.), and strengthening collaboration between departments and dedicated organizations by environmental media ㆍ Budget Securing: Possible to adjust (negotiate) to a level that is enforceable by KEI, however, the budget must be continuously guaranteed Ⅴ. Conclusion 1. Conclusion □ Improvement of researcher awareness and establishment of a collaborative ecosystem ○ Practical measures are needed to identify, analyze, and make policy decisions on various social and environmental issues, and it is necessary to prepare a system to respond in advance - Data-based response cases are increasing due to the continuous occurrence of social and environmental issues - Convergence of environmental statistics and social statistics, weakening the boundaries of environmental policy research ○ Policy reflection through flexible data utilization for rapid data production - Reflects the situation in which all physical elements such as people and objects are connected and interacted - Changes in perspective on data: timely results and determination of the importance of data trust - Constraints in environmental policy research: There is very little data available for timely issue analysis ○ Support for shortening the periodicity of statistical construction and screening data as a substitute - Review of the scope and limitations of various data in analyzing social and environmental issues - Although the amount of medical waste has increased significantly, there are no official statistics on the amount of waste in 2021 □ Establishment of a pilot environment data hub and foundation for environmental data utilization - Derivation of essential elements of building an environmental data hub: data set, data storage, data analysis platform - KEI-type mid- to long-term environmental data hub roadmap presented □ Suggestion of requirements for environmental data hub for social/environmental issue analysis - Necessary to secure data for analysis of social and environmental issues, to establish a foundation for data sharing, and to establish an analysis tool - Necessary to establish a data-based policy decision support system that can draw policy implication
환경 디지털 뉴딜 구현을 위한 AI 기반 환경 감시 체계 구축
진대용,표종철,김도연,조윤랑,한국진 한국환경연구원 2021 기본연구보고서 Vol.2021 No.-
Ⅰ. Introduction □ Research background ㅇ Use of AI technology in the environmental (policy) sector can perform an independent role as a bridge between Green New Deal and Digital New Deal, but it fails to sufficiently fulfill its role ㅇ There is a need to establish strategies to systematically and comprehensively use data in the environmental sector with focus on AI technology ㅇ To build an ‘AI-based environmental monitoring system’, it is necessary to first develop cases such as environmental change detection, natural disaster analysis, and pollution occurrence pattern analysis by media type, through which necessary elements must be derived and processes designed □ Research objective ㅇ To develop major cases for automatic AI-based environmental monitoring and response through combined use of AI and XAI and provide strategies to build an “AI-based environmental monitoring system” based on the above Ⅱ. Literature Review □ Expanding the application scope of AI studies in environmental policy research ㅇ Limitations of existing decision-making methodologies can be overcome with AI models comprised of multiple parameters ㅇ Application as environmental studies using AI methodologies is being expanded - Various forms of data such as numbers, images, and videos can be used as variables, allowing prediction, classification, detection, change detection, and impact analysis - AI shows high accuracy in terms of performance, but there is the issue of low explanatory power due to complicated model compositions □ With the emergence of explainable AI (XAI), factors with a huge impact can be predicted as well as validated, which can be used as quantitative data for decision making ㅇ XAI studies are conducted actively to ensure transparency and reliability of AI algorithms in a black box structure - Starting with the explainable AI project XAI announced by the Defense Advanced Research Projects Agency (DARPA) in the U.S. in 2017, technological research on explainable AI is being developed ㅇ Studies analyzing XAI are applied to various fields of the environment such as ecosystem in addition to environmental pollution problems such as air pollution, water pollution, and soil pollution - XAI models mostly used include local interpretable model-agnostic explanations (LIME), SHapley Additive exPlanation (SHAP), and Gradient-weighted Class Activation Mapping (Grad-CAM) □ Data can be collected using various applications and devices such as IoT, drones, and unmanned vehicles, thereby accumulating environmental big data and activating studies applying AI ㅇ Image and video data created in the environmental sector are related to various fields such as climate and environmental pollution (air, water quality, soil, noise, etc.) - Studies are actively conducted on AI-based prediction, classification and interpolation of missing values - In addition to prediction research, factors with a huge impact on XAI-based prediction are presented, which can be used as quantitative data for decision making Ⅲ. AI-based Mountain Land Change Detection 1. Overview of research on AI-based mountain land change detection □ Measures are taken using GIS and remote sensing technology such as factual surveys on mountain land changes, derivation of suspicious sites, and other follow-up measures, but there is a need for early response and decrease of damages through early detection of mountain land changes □ Therefore, this study raises the possibility of mountain land change detection using deep learning technology 2. Forest maps in Korea and overseas □ Supply of forest maps in Korea and overseas ㅇ National Geographic Information Platform, (National Geographic Information Institute), Forest Space Portal Service (Korea Forest Service), AI Hub aerial photographs of forest tree species data (National Information society Agency), etc. ㅇ UCI Machine Learning Repository (U.S.), Skyscape dataset (German Aerospace Center), Semantic Change detection dataset (Wuhan University in China), etc. 3. AI-based mountain land change detection input data and model composition □ AI model input data ㅇ Aerial photographs of forest tree species are used from AI Hub national land environment data ㅇ Aerial videos are subdivided into 128 x 128, organizing each video with 16 images and normalizing the information of RGB aerial images ㅇ For labeling data, binary annotation is performed to classify into just forests and non-forests, and aerial photographs including illegible labels are excluded ㅇ Total 16,000 images for learning and 16,000 images for validation in the capital area are used as AI model input data ㅇ The same area multi-period test image datasets on Kakao Map are formed to test the performance of mountain land change detection □ Structure of the AI model ㅇ The U-Net deep learning model structure specialized for image segmentation is applied ㅇ The layer composition of trained U-Net deep learning architecture and hyper parameters are fine-tuned to perform mountain land change detection learning 4. Results and application of AI model mountain land change detection □ The training and validation results of the U-Net model well divided forests and non-forests and showed a similar pattern as actual labeling areas □ Mountain land changes are well distinguished when applying the same area multi-period test images on Kakao Map to the trained U-Net model, which proved the applicability of deep learning models in mountain land change detection Ⅳ. Correlation Analysis of AI-based Climate/air Pollution and COVID-19 1. Overview of research in correlation analysis of AI-based climate/air pollution and COVID-19 □ There is no evidence that climate change has a direct impact on the spread of COVID-19, but related discussions are continuously being made □ Correlation analysis of climate/air pollution and COVID-19 in Seoul was conducted in 2020, and the possibility of building an AI model simulating the relationship between climate/air pollution factors and COVID-19 was reviewed 2. Literature review on correlation between climate/air pollution and COVID-19 □ After analyzing the latest research cases in Korea and overseas, the results vary among nations and proved that there is no evidence that climate and air pollution variables have a direct impact on COVID-19 ㅇ Studies are actively conducted on the impact of climate and air pollution since the COVID-19 pandemic - Infectious diseases such as MERS, SARS, and COVID-19 show a seasonal pattern and can be predicted using temperature and humidity data - NO<sub>2</sub> was proved to be a key element of death from COVID-19 in Europe, and AOD in India turned out to be the lowest in 20 years due to COVID-19 3. Correlation analysis of climate/air pollution and COVID-19 and results □ A pilot study was conducted on correlation analysis of climate/air pollution and COVID-19 at the heart of Seoul in 2020 ㅇ Learning datasets are built by collecting confirmed cases and deaths of COVID-19, and climate and air pollution data ㅇ Spearman and Kendall correlation analyses were conducted on each section to exclude seasonal factors - The results showed that temperature was a variable highly correlated with the number of confirmed cases of COVID-19 - As a result, the correlation coefficient of temperature in each section changed significantly, proving that there is little relevance ㅇ The results proved the limitations and raised the need to add policy and social activity variables for future analysis - Must conduct analysis by adding directly related input variables (policy, population mobility, etc.) that can estimate the number of confirmed cases of COVID-19 - Must increase the analysis period by accumulating data to 1 year of 2020 Ⅴ. AI-based Inundation Trace Detection 1. Overview of research on AI-based inundation trace detection □ Research is conducted on building an AI-based urban inundation trace detection system using open data □ Preprocessing GIS-based spatial data, building AI model input data of Python-based preprocessing data, learning inundation trace detection by building machine learning and deep learning models, and estimating key factors of inundation detection among input data used □ Developing a flood susceptibility map, identifying and analyzing key factors, and conducting prediction and analysis of future flood susceptible areas applying climate change scenario data 2. AI-based inundation trace detection input data and model composition □ AI model input data ㅇ Hydrology map, topographic map, climate change scenario data, and GIS data are used on Environment Big Data Platform, Open MET Data Portal, and Environmental Space Information Service ㅇ Input data is formed by unifying, rasterizing, and stacking the spatial scope to the capital area of spatial data obtained ㅇ For random forest model training, 150 points of inundation scope in 2010 are used as training data, and 50 points as validation data □ Structure of the AI model ㅇ Inundation trace detection performance in the capital area is evaluated by composing and learning the random forest model, which is a typical machine learning model using the ensemble learning method ㅇ Variable importance of the random forest model was estimated to analyze the sensitivity of input data in inundation trace detection results 3. AI model inundation trace detection performance and validation □ Performance evaluation of the inundation trace detection using the random forest model ㅇ Similar results were found between the inundation trace scope learned by random forest and the inundation trace scope measured in 2010 ㅇ High flood susceptibility was verified around the waters of Hangang River through the flood susceptibility map of the capital area applied to all capital areas of the trained model 4. Inundation trace prediction through climate change scenario □ Inundation trace change prediction by applying the RCP 8.5 scenario ㅇ Change in the inundation trace range in the capital area is verified by change in precipitation by applying the future RCP scenario to the trained random forest model ㅇ Expected to be used in AI-based urban inundation damage prediction according to climate change scenarios Ⅵ. AI-based Particulate Matter (PM) Occurrence Pattern Analysis: Focusing on High Concentration Cases 1. Overview of research on AI-based PM occurrence pattern analysis □ Need for research on AI-based PM occurrence pattern analysis ㅇ PM concentrations in Korea are decreasing overall with establishment and active implementation of related policies ㅇ However, there is an ongoing phenomenon of high concentration PM that still lasts long, and the nation’s anxiety over PM is not yet resolved, and there are more and more related policies and interest due to the expansion of environmental awareness ㅇ Building an AI model and providing application plans for PM occurrence pattern analysis 2. AI-based PM occurrence pattern analysis input data and model composition □ AI model input data ㅇ Air quality and weather/climate data on Air Korea and Open MET Data Portal are used, as well as external factors (air quality in China) ㅇ Research is conducted on Chungnam in 2017-2019, with data restructured based on the air quality monitoring network □ Structure of the AI model ㅇ The XGBoost model, which is a typical machine learning model using the boosting technique, is developed and the PM estimation model is built through learning 3. Review of performance and applicability of the AI-based high concentration PM occurrence pattern analysis model □ PM estimation performance test ㅇ Comparing the estimated and measured values of the model built on test data, the trend was traced in most cases ㅇ However, some cases of high concentration PM were not estimated well, which can be supplemented later by increasing learning data and additionally selecting related variables □ PM occurrence pattern analysis results ㅇ It has been proved that the grounds for model judgment about PM concentration estimation can be derived by applying PDP and SHAP to the built model ㅇ Key factors of PM occurrence patterns are identified, and analysis cases on contribution of input variables in determining model values for each case are provided □ Review of the applicability of the AI-based high concentration PM occurrence pattern model ㅇ Can build an AI model estimating PM<sub>2.5</sub> using air pollutants, weather/ climate factors, and China’s air quality data ㅇ SHAP values have limitations in that they are dependent on the output values of the AI model built and subordinate to the characteristics of the model built ㅇ The output results are closer to systemizing the correlation through pattern analysis of input and output variables without guaranteeing the causal relations ㅇ Nonetheless, the AI model can have an effect at the sample level in PM<sub>2.5</sub> estimation of variables ㅇ By discussion with experts in the future, it is necessary to review the consistency in contribution to PM concentrations and improve into a highly reliable quantitative evaluation model Ⅶ. Conclusions and Policy Suggestions (Academic Outcomes) □ Case studies on AI-based environment for environmental Digital New Deal ㅇ This study presented cases used in the environmental sector with focus in AI technology, such as environmental change detection (mountain land change detection), natural disaster analysis (inundation control and prediction), infectious disease analysis (correlation analysis of climate/air factors and COVID-19) and environmental pollution analysis by media type (PM occurrence pattern analysis) ㅇ All kinds of data such as numbers, images, and geographical information can be used as input variables, and can be applied in estimating and predicting variables of interest, analyzing (image) changes, and analyzing variable impact depending on the research purpose ㅇ Presenting ways to use as quantitative data for decision making by providing factors with a great impact in obtaining values of the model built through the XAI model □ Essential elements and application plan to build an AI-based monitoring system ㅇ Essential elements, basic models, and analysis processes are established to build an AI-based monitoring system through many actual cases of AI application in the environmental sector ㅇ The essential elements of the AI-based monitoring system are building data (collecting or producing data) ⇒ building an AI model ⇒ analyzing and monitoring based on the AI model ⇒ deriving outcomes and securing policy grounds ㅇ Automatic real-time or regular data collection is essential for building a sustainably applicable environmental monitoring system ㅇ It is necessary to build a virtuous cycle of deriving and using data produced by building an AI model as the results and updating the model for parts not considered ㅇ By securing consistency with expert knowledge in the process of building the model and interpreting the results, the monitoring system will be able to fulfill its role by deriving continuous (automatic) results and providing scientific grounds and policy grounds when establishing measures to resolve environmental issues □ Suggestion of follow-up tasks ㅇ For precise and highly practical analysis, it is necessary to build high-resolution temporal and spatial data; thus, this study suggests review of fields that need data building and research on high-resolution data production fit for the purpose by setting the results and application scope of data quality ㅇ There is a need for research that rationally reflects and comparatively analyzes the results of consistency review with experts, physical modeling, and simulation based on building of AI and XAI models such as pollution by media type and natural disaster analysis
인공지능 딥러닝을 활용한 조류현상 예측기술 개발 및 활용방안
홍한움,조을생,강선아,한국진 한국환경연구원 2020 기본연구보고서 Vol.2020 No.-
Ⅰ. Background and Aims of Research 1. Research outline □ Research title: Development and application of an algal bloom forecast system using artificial intelligence deep learning technology □ Research period: January 1, 2020 ~ December 31, 2020 2. Necessity and purpose of research □ Limitations of the current algal bloom warning system ㅇ The Ministry of Environment and the National Institute of Environmental Research implemented an algal bloom warning system based on the measured values of harmful blue-green algae and the EFDC model. ㅇ Limitations of physics-based models - They have a solid theoretical background but there is a difficulty in securing the detailed data required by the model. - Since algal blooms are living organisms, the law of conservation of mass does not apply to the number of harmful blue-green algae cells. Therefore, the physics-based model has limitations. - Deep learning-based forecasting can be considered as an alternative and a complementary method. Ⅱ. Current Algal Bloom Response Policy 1. Algal bloom warning system □ Year of introduction: 1998 □ Legal basis: Article 21 of the Water Environment Conservation Act □ Target ㅇ 28 branches of water supply sources and hydrophilic activities ㅇ Issuer: Basin Environmental Office and local governments □ Analysis items ㅇ Measured numbers of harmful blue-green algae cells ㅇ Based on water source section - Attention: 1,000 (cells/mL) or more - Alert: 10,000 (cells/mL) or more - Large bloom: 1,000,000 (cells/mL) or more ㅇ Based on hydrophilic activities section - Attention: 20,000 (cells/mL) or more - Alert: 100,000 (cells/mL) or more 2. (Former) Water quality forecast system □ Year of Introduction: 2012 □ Legal basis: Article 21 of the Water Environment Conservation Act □ Target ㅇ 17 branches including 16 barrages and the Bukhan River Sambong-ri of the four major rivers of South Korea ㅇ Issuer: National Institute of Environmental Research □ Analysis items ㅇ Predicted water temperature and chlorophyll-a concentration ㅇ Currently, as the algal bloom warning system and the water quality forecast system are integrated, no forecast is issued although forecasting is performed. □ Providing forecasts for harmful blue-green algae cells ㅇ Twice a week, Monday and Thursday, six branches that are targets of the algal bloom system ㅇ Issuing the predicted number of harmful blue-green algae cells and water temperature predictions 3. Status of the water quality monitoring network □ Legal basis ㅇ Article 22 of the Basic Act on Environmental Policy and Article 9 of the Water Environment Conservation Act □ Organization ㅇ Water quality monitoring network - Target: water quality measurement data in rivers, lakes, agricultural water, urban streams, and industrial rivers - Provided information: water depth, hydrogen ion concentration, dissolved oxygen content, BOD, COD, suspended matter, total nitrogen, total phosphorus, total organic carbon (TOC), water temperature, phenols, electrical conductivity, total coliform group, dissolved total nitrogen, ammonia nitrogen, nitrate nitrogen, dissolved total phosphorus, phosphate phosphorus, chlorophyll a, transparency - Cycle: once a month, once a week for major locations ㅇ Total quantity measurement network - Target: basic data for total amount management in areas subject to the total water pollution rate system - Provided information: water temperature, hydrogen ion concentration, electrical conductivity, dissolved oxygen, BOD, COD, suspended matter, total nitrogen, total phosphorus, TOC, flow rate - Cycle: once a month ㅇ Automatic measurement network - Operated to complement the hand-operated measurements of the water quality monitoring network - Provided information: (Common) water temperature, hydrogen ion concentration, dissolved oxygen content, electrical conductivity, TOC (Optional) Turbidity, chlorophyll a, TN, TP, NH<sub>3</sub>-N, NO<sub>3</sub>-N, PO<sub>3</sub>-P, VOCs (nine types, ten items), phenol, heavy metals, biological monitoring items - Cycle: once a day ㅇ Sediment monitoring network - Purpose: investigation of the physicochemical properties of sediments in public waters subject to water quality conservation of South Korea - Provided information: (Common) water temperature, hydrogen ion concentration, dissolved oxygen content, electrical conductivity, TOC (Optional) maximum depth during collection, surface measurement depth, surface and bottom depth, water temperature, dissolved oxygen content, pH, electrical conductivity, sediment particle size, moisture content, ratio and grade of complete combustion potential, COD, TOC, TN, TN grade, TP, SRP, heavy metals, conservative element concentration - Cycle: (River) twice a year for the first and second halves, (Lake) once a year ㅇ In addition, there are additional measurements of radioactive monitoring networks and biometric networks. Ⅲ. Water Quality Prediction Models 1. Physics-based model □ Example ㅇ EFDC, QUAL2K, WASP, etc. ㅇ The National Institute of Environmental Research is operating an EFDC-based model. □ Organization ㅇ Construct a grid network by dividing the water system into sub-regions and set boundary conditions ㅇ Estimate the water quality in sub-area units within the grid 2. Deep learning algorithm □ Model structure ㅇ Multi-layer perceptron (MLP) - It mimics the neurons and synapses of a neural network. It consists of an input layer, a hidden layer, and an output layer. it has a multi-layered structure with more than one hidden layer. ㅇ Recurrent Neural Network (RNN) - It additionally reflects the feedback effects of previous hidden nodes. - Nowadays, GRU and LSTM models are used. These models utilize the long-term memory based on a simple recurrent neural network. 3. Physics-based model vs. Deep learning algorithm □ Physics-based model ㅇ Based on well-established mathematical/physical laws ㅇ Actual observations are used for model evaluation. ㅇ Prediction can be performed at a more detailed resolution than observed values based on physical equations. ㅇ Disadvantages - Errors due to uncertain initial/boundary conditions - Difficulty in predicting the abnormal phenomena - May not work due to problems such as poor input data, instability of model relations, modeling method, etc. □ Deep learning algorithm ㅇ Establish the relationship between input and output variables through machine learning ㅇ Actual observations are used for model construction. ㅇ Includes error conditions in the model by quantifying the error of the measurements ㅇ Advantages in short-term predictions with greater uncertainties compared to physics-based models ㅇ Disadvantages - Requires a huge amount of data - Cannot be performed at a more detailed resolution than observation resolution - Practical application is limited since the relationship between input and output variables cannot be explained. Ⅳ. Development of an Algal Bloom Forecast Algorithm Based on Deep Learning 1. Data collection and preprocessing □ Model construction target ㅇ Target point: algae observation point in the hydrophilic activity section of the Han River ㅇ Target variable - Direct prediction of the number of harmful blue-green algae cells which is the direct cause of the algal bloom - Differentiated from previous studies that indirectly predicted the algal bloom through chlorophyll a prediction □ Model construction period ㅇ Target period: April 2007 ~ August 2020 ㅇ Data in winter from December to March, which is relatively safe from algal blooms, are excluded. 2. Characteristics of algae data □ Descriptive statistics □ Characteristics ㅇ Extremely right-skewed asymmetric distribution ㅇ Extreme asymmetric distribution is exhibited since algal blooms occur intensively in summer when the temperature is high. ㅇ Because of this, it is difficult to directly predict harmful blue-green algae using physics-based models or traditional statistical models. 3. Development of a predicting algorithm □ RNN model construction ㅇ Target of prediction: the number of harmful blue-green algae cells ㅇ Constructing an LSTM prediction algorithm to utilize the long-term memory information ㅇ Loss function for optimization: least squares function Optimization algorithm: ADAM ㅇ Training data: April 2007 ~ November 2016 Test data: April 2017 ~ June 2020 □ Results ㅇ The increasing and decreasing patterns are well predicted although there is difficulty in predicting using traditional prediction methods due to high data instability, which results from the fact that the hydrophilic activity section is located downstream of the river. ㅇ Well predict the occurrence of the largest extreme value at the same time ㅇ Prediction error Ⅴ. Conclusion and Achievements □ Achievements ㅇ Since the prediction using a physical model is established based on a well-established theory, it is widely used to predict properties of water quality such as water temperature, dissolved oxygen, total phosphorus, and total nitrogen. The prediction using the physical equation based on the law of conservation of mass is well suited for conservative substance. However, there is a limitation in the prediction of algae cells since it is the activity of living organisms. ㅇ Existing algal phenomena prediction studies have not directly predicted the number of harmful blue-green algae cells, which is the direct cause of algal phenomena. It is replaced by using the results of chlorophyll a concentration prediction. ㅇ In this study, a deep learning algorithm based on recurrent neural networks was used as an alternative method to predict the number of harmful blue-green algae cells. It well predicted the increasing or decreasing patterns of algae and the occurrence of abnormal phenomena at the concurrent point. □ Limitations ㅇ Only water quality, upstream water quality, water level, and meteorological information were used as input variables. These variables are already used in the physical model. Taking into account social variables such as population change and the benefits of deep learning analytics can be leveraged to a greater extent. Unstructured information such as satellite images can be additionally considered. ㅇ There is a limitation in the amount of data. In this study, the model was studied using data from a total of 365 weekly data collections from 2007 to 2016, but this amount itself is not sufficient. Whenever new data are added, the predictive model should be updated to increase the prediction efficiency. ㅇ There is a limitation due to the black-box characteristic. The detailed operational process of the prediction model cannot be clearly observed. When implementing a policy, evidence is needed. The black-box characteristic of deep learning prediction models makes it difficult to provide clear evidence. □ Conclusions and suggestions ㅇ Because it is very simple to perform predictions with the model that has already been established, it can be directly used as reference information for current algal bloom forecasts. ㅇ Since predictions using deep learning models and physics-based models both have advantages and disadvantages, it is most desirable to integrate the two prediction methods. Based on the deep learning model, the physical model can be integrated by including the physical equation in the constraint of the objective function. Or, deep learning can be partially performed in the partial module of the physical model prediction.