1 Chaves, R. P., "What Don't RNN Language Models Learn About Filler-Gap Dependencies?" 3 (3): 2020
2 Ettinger, A., "What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models" 2020
3 Giulianelli, M., "Under the hood: Using diagnostic classifiers to investigate and improve how language models track agreement information"
4 Mitchell, T. M., "The need for biases in learning generalizations" Rutgers Univ 1980
5 Lakoff, G., "The metaphorical structure of the human conceptual system" 1980
6 Hauser, M., "The faculty of language: What is it, who has it, and how did it evolve?" 298 : 2002
7 Lakretz, Y., "The emergence of number and syntax units in LSTM language models" 2019
8 Chomsky, N, "The Minimalist Program" MIT Press 1995
9 Marvin, R., "Targeted Syntactic Evaluation of Language Models" 2018
10 Linzen, T., "Syntactic structure from deep learning" 2021
1 Chaves, R. P., "What Don't RNN Language Models Learn About Filler-Gap Dependencies?" 3 (3): 2020
2 Ettinger, A., "What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models" 2020
3 Giulianelli, M., "Under the hood: Using diagnostic classifiers to investigate and improve how language models track agreement information"
4 Mitchell, T. M., "The need for biases in learning generalizations" Rutgers Univ 1980
5 Lakoff, G., "The metaphorical structure of the human conceptual system" 1980
6 Hauser, M., "The faculty of language: What is it, who has it, and how did it evolve?" 298 : 2002
7 Lakretz, Y., "The emergence of number and syntax units in LSTM language models" 2019
8 Chomsky, N, "The Minimalist Program" MIT Press 1995
9 Marvin, R., "Targeted Syntactic Evaluation of Language Models" 2018
10 Linzen, T., "Syntactic structure from deep learning" 2021
11 van Schijndel, M., "Single‐stage prediction models do not explain the magnitude of syntactic disambiguation difficulty" 45 (45): 2021
12 Khandelwal, U., "Sharp nearby, fuzzy far away: How neural language models use context"
13 Banko, M., "Scaling to very very large corpora for natural language disambiguation" 2001
14 Chomsky, N., "Rules and representations" 3 : 1980
15 Adam, G., "Predictive power of word surprisal for reading times is a linear function of language model quality" 2018
16 Lasnik, H., "Oxford Handbook of Universal Grammar" Oxford Univ. Pres 2017
17 Baroni M, "On the proper role of linguistically-oriented deep net analysis in linguistic theorizing"
18 Wilcox, E. G., "On the predictive power of neural language models for human real-time comprehension behavior"
19 Pinker, S, "On language and connectionism: analysis of a parallel distributed processing model of language acquisition" 28 : 1988
20 Warstadt, A., "Neural Network Acceptability Judgments" 2019
21 Futrell, R., "Nerual language models as psycholinguistic subjects:Representation of syntatic state" 1 : 2019
22 van Schijndel, M., "Modeling garden path effects without explicit hierarchical syntax" 2018
23 Clark, A, "Microcognition: Philosophy, Cognitive Science, and Parallel Distributed Processing" MIT Press 1989
24 Gibson, E, "Memory limitations and structural forgetting: the perception of complex ungrammatical sentences as grammatical" 14 : 1999
25 Hart, B., "Meaningful differences in the everyday experience of young American children" Paul H Brookes Publishing 1995
26 Hochreiter, S., "Long short-term memory" 9 : 1997
27 Cho, K., "Learning phrase representations using RNN EncoderDecoder for statistical machine translation" 2014
28 Chrupała, G., "Learning language through pictures" 2 : 2015
29 Radford, A., "Language models are unsupervised multitask learners" 9 : 2019
30 Gómez, R. L., "Infant artificial language learning and language acquisition" 2000
31 Chomsky, N., "Handbook of Mathematical Psychology Vol.2" 1963
32 Adi, Y., "Fine-grained analysis of sentence embeddings using auxiliary prediction tasks" 2017
33 Raffel, C., "Exploring the limits of transfer learning with a unified text-to-text transforme"
34 Shi, X., "Does string-based neural MT learn source syntax?" 2016
35 Linzen, T., "Distinct patterns of syntactic agreement errors in recurrent networks and humans" 2018
36 Weston, J. E., "Dialog-based language learning" 2016
37 Christiansen, M., "Connectionist natural language processing:the state of the art" 23 : 1999
38 Fodor, J, "Connectionism and cognitive architecture: a critical analysis" 28 : 1988
39 Gulordava, K., "Colorless green recurrent networks dream hierarchically" 1 : 2018
40 Bock, K., "Broken agreement" 23 : 1991
41 Warstadt, A., "BLiMP: The Benchmark of Linguistic Minimal Pairs for English" 2020
42 Devlin, J., "BERT: pre-training of deep bidirectional transformers for language understanding" 1 : 2016
43 Linzen, T., "Assessing the ability of LSTMs to learn syntax-sensitive dependencies" 4 : 2016
44 Chomsky, N, "Aspects of the Theory of Syntax" MIT Press 1965
45 Wilcox, E., "A targeted Assessment of Incremental Processing in Neural Language Models and Humans" 2021
46 Hu, J., "A systematic assessment of syntactic generalization in neural language models"
47 Hewitt, J, "A structural probe for finding syntax in word representations" 2019
48 Churchland, P, "A Neurocomputational Perspective: The Nature of Mind and the Structure of Science" MIT Press 1989