http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Scene Mapping-based Video Registration Using Frame Similarity Measurement and Feature Tracking
Yeonseung Choo,Jinbeum Jang,Joonki Paik 대한전자공학회 2019 IEIE Transactions on Smart Processing & Computing Vol.8 No.6
Video registration is a technique that obtains pairs of frames showing the same scene to synchronize two videos. This paper presents a video registration algorithm based on scene mapping using frame similarity measurement and feature point tracking. The proposed algorithm consists of four steps: i) frame-to-frame matching based on similarity measurement to set the start frame, ii) feature tracking using feature points of the start frame to obtain a candidate matching scene, iii) end-frame matching for scene mapping, and iv) synchronization of two videos using a linear equation. In the proposed algorithm, frame-to-frame matching is performed using traditional feature detection and matching. Additionally, the proposed algorithm measures the similarity between frames to register videos with the most similar structural information. Feature point tracking is performed to track and collect similar structures in a candidate scene. We demonstrate that the proposed algorithm can synchronize frames from registered videos for some experiments. Therefore, the proposed algorithm can be applied to various areas.
3D Cross-Modal Retrieval Using Noisy Center Loss and SimSiam for Small Batch Training
YeonSeung Choo,Boeun Kim,Hyun-Sik Kim,Yong-Suk Park 한국인터넷정보학회 2024 KSII Transactions on Internet and Information Syst Vol.18 No.3
3D Cross-Modal Retrieval (3DCMR) is a task that retrieves 3D objects regardless of modalities, such as images, meshes, and point clouds. One of the most prominent methods used for 3DCMR is the Cross-Modal Center Loss Function (CLF) which applies the conventional center loss strategy for 3D cross-modal search and retrieval. Since CLF is based on center loss, the center features in CLF are also susceptible to subtle changes in hyperparameters and external inferences. For instance, performance degradation is observed when the batch size is too small. Furthermore, the Mean Squared Error (MSE) used in CLF is unable to adapt to changes in batch size and is vulnerable to data variations that occur during actual inference due to the use of simple Euclidean distance between multi-modal features. To address the problems that arise from small batch training, we propose a Noisy Center Loss (NCL) method to estimate the optimal center features. In addition, we apply the simple Siamese representation learning method (SimSiam) during optimal center feature estimation to compare projected features, making the proposed method robust to changes in batch size and variations in data. As a result, the proposed approach demonstrates improved performance in ModelNet40 dataset compared to the conventional methods.
Multi-resolution Fusion Network for Human Pose Estimation in Low-resolution Images
Boeun Kim,YeonSeung Choo,Hea In Jeong,Chung-Il Kim,Saim Shin,김정호 한국인터넷정보학회 2022 KSII Transactions on Internet and Information Syst Vol.16 No.7
2D human pose estimation still faces difficulty in low-resolution images. Most existing top-down approaches scale up the target human bonding box images to the large size and insert the scaled image into the network. Due to up-sampling, artifacts occur in the low-resolution target images, and the degraded images adversely affect the accurate estimation of the joint positions. To address this issue, we propose a multi-resolution input feature fusion network for human pose estimation. Specifically, the bounding box image of the target human is rescaled to multiple input images of various sizes, and the features extracted from the multiple images are fused in the network. Moreover, we introduce a guiding channel which induces the multi-resolution input features to alternatively affect the network according to the resolution of the target image. We conduct experiments on MS COCO dataset which is a representative dataset for 2D human pose estimation, where our method achieves superior performance compared to the strong baseline HRNet and the previous state-of-the-art methods.