1 J. Redmon, "You Only Look Once: Unified, Real-Time Object Detection" 2016
2 J. Redmon, "YOLOv3: An Incremental Improvement"
3 J. Redmon, "YOLO9000: Better, Faster, Stronger" 2017
4 A. Das, "Visual Dialog" 2017
5 P. Anderson, "Vision-and-Language Navigation: Interpreting Visually-grounded Navigation Instructions in Real Environments" 2018
6 K. Simonyan, "Very Deep Convolutional Networks for Large-Scale Image Recognition" 2015
7 A. Agrawal, "VQA: Visual Question Answering" 2425-2433, 2015
8 C. Ma, "The Regretful Agent:Heuristic-Aided Navigation through Progress Estimation" 2019
9 L. Ke, "Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation" 2019
10 W. Liu, "Ssd: Single Shot Multibox Detector" Springer 21-37, 2016
1 J. Redmon, "You Only Look Once: Unified, Real-Time Object Detection" 2016
2 J. Redmon, "YOLOv3: An Incremental Improvement"
3 J. Redmon, "YOLO9000: Better, Faster, Stronger" 2017
4 A. Das, "Visual Dialog" 2017
5 P. Anderson, "Vision-and-Language Navigation: Interpreting Visually-grounded Navigation Instructions in Real Environments" 2018
6 K. Simonyan, "Very Deep Convolutional Networks for Large-Scale Image Recognition" 2015
7 A. Agrawal, "VQA: Visual Question Answering" 2425-2433, 2015
8 C. Ma, "The Regretful Agent:Heuristic-Aided Navigation through Progress Estimation" 2019
9 L. Ke, "Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation" 2019
10 W. Liu, "Ssd: Single Shot Multibox Detector" Springer 21-37, 2016
11 D. Fried, "Speaker-Follower Models for Vision-and-Language Navigation" 28 : 2018
12 C. Ma, "Self-Monitoring Navigation Agent via Auxiliary Progress Estimation" 2019
13 R. Grishick, "Rich Feature Hierarchies for Accurate Oobject Detection and Semantic Segmentation" 2014
14 X. Wang, "Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation" 2019
15 B. Zhou, "Places: A 10million Image Database for Scene Recognition" 40 : 1452-1464, 2017
16 T.-Y. Lin, "Microsoft COCO: Common Objects in Context" 13 : 740-755, 2014
17 A. Chang, "Matterport3D:Learning from RGB-D Data in Indoor Environments" 5 : 2017
18 K. He, "Mask R-CNN" 2017
19 X. Wang, "Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation" 696-711, 2018
20 N. Silberman, "Indoor Segmentation and Support Inference from RGBD Images" 746-760, 2012
21 J. Deng, "ImageNet:A Large-Scale Hierarchical Image Database" 2009
22 D. Gordon, "IQA:Visual Question Answering in Interactive Environments" 2018
23 C. Szegedy, "Going Deeper with Convolutions" 1-9, 2015
24 S. Ren, "Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks" 2015
25 R. Girshick, "Fast R-CNN" 2015
26 A. Das, "Embodied Question Answering" 5 : 2018
27 A. Hanni, "Deep learning Framework for Scene based Indoor Location Recognition" IEEE 2017
28 K. He, "Deep Residual Learning for Image Recognition" 770-778, 2016
29 K. Wang, "A Discriminative Algorithm for Indoor Place Recognition based on Clustering of Features and Images" 14 : 407-419, 2017