1 J. Redmon, "You Only Look Once: Unified, Real-Time Object Detection" 779-788, 2016
2 W. Liu, "SSD: Single Shot MultiBox Detector" 21-37, 2016
3 R. Hu, "Natural Language Object Retrieval" 4555-4564, 2016
4 R. Hu, "Modeling Relationships in Referential Expressions with Compositional Modular Networks" 1115-1124, 2017
5 L. Yu, "Modeling Context in Referring Expressions" 69-85, 2016
6 V. K. Nagaraja, "Modeling Context Between Objects for Referring Expression Understanding" 2016
7 T.-Y. Lin, "Microsoft COCO: Common Objects in Context" 740-755, 2014
8 J. Krishnamurthy, "Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World" 1 : 193-206, 2013
9 J. Pennington, "GloVe: Global Vectors for Word Representation" 1532-1543, 2014
10 J. Mao, "Generation and Comprehension of Unambiguous Object Descriptions" 11-20, 2016
1 J. Redmon, "You Only Look Once: Unified, Real-Time Object Detection" 779-788, 2016
2 W. Liu, "SSD: Single Shot MultiBox Detector" 21-37, 2016
3 R. Hu, "Natural Language Object Retrieval" 4555-4564, 2016
4 R. Hu, "Modeling Relationships in Referential Expressions with Compositional Modular Networks" 1115-1124, 2017
5 L. Yu, "Modeling Context in Referring Expressions" 69-85, 2016
6 V. K. Nagaraja, "Modeling Context Between Objects for Referring Expression Understanding" 2016
7 T.-Y. Lin, "Microsoft COCO: Common Objects in Context" 740-755, 2014
8 J. Krishnamurthy, "Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World" 1 : 193-206, 2013
9 J. Pennington, "GloVe: Global Vectors for Word Representation" 1532-1543, 2014
10 J. Mao, "Generation and Comprehension of Unambiguous Object Descriptions" 11-20, 2016
11 S. Ren, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" 91-99, 2015
12 R. Luo, "Comprehension-Guided Referring Expressions" 2017
13 L. Yu, "A Joint Speaker-Listener-Reinforcer Model for Referring Expressions" 7282-7290, 2017