Computation, Language, and Meaning
Band of Researchers

What do all of the languages of the world have in common? Why and how did these linguistic universals arise? What do they tell us about human cognition and how can we harness them to build better language technologies?

At CLMBR, we address these and related questions from a computational and experimental perspective. We are a lab hosted at the Linguistics Department of the University of Washington, as part of the larger Treehouse computational linguistics lab.

News

Two papers and an extended abstract accepted at BlackboxNLP! (September 2020)

Invited talks at Human Interactivity and Language Lab (June 2020) and the Computation and Language Lab (October 2020)

Shane is co-organizing a workshop on Computational and Experimental Explanations in Semantics and Pragmatics, co-located with ESSLLI (August 2020 2021)

Daniel is co-organizing a track on deep learning in search at NIST's TREC 2020 (Nov 2020)

"On the Spontaneous Emergence of Discrete and Compositional Singals" (with Nur Lan and Emmanuel Chemla) accepted at ACL; preprint available soon (April 2020)

"Complexity/informativeness trade-off in the domain of indefinite pronouns" presented at Semantics and Linguistic Theory (SALT 30) (April August 2020)

Semantic Expressivism for Epistemic Modals appears online at Linguistics and Philosophy (March 2020)

Invited talk at Language Change: Theoretical and Experimental Perspectives at the Hebrew Univeristy of Jerusalem (March 2020 January 2021)

Most, but not more than half is proportion-dependent and sensitive to individual differences appears in the proceedings of Sinn und Bedeutung (SuB 24) (February 2020)

Daniel is teaching Deep Learning in Search at ACM SIGIR/SIGKDD Africa Summer School on Machine Learning for Data Mining and Search (January 2020)

"Quantifiers in natural language optimize the simplicity/informativeness trade-off" presented at the 22nd Amsterdam Colloquium (December 2019)

"An Explanation of the Veridical Uniformity Universal" appears online at Journal of Semantics (open access) (December 2019)

"Ease of Learning Explains Semantic Universals" appears online at Cognition (November 2019)

"Learnability and Semantic Universals" appears online at Semantics & Pragmatics (November 2019)

"Towards the Emergence of Non-trivial Compositionality" accepted at Philosophy of Science (September 2019)

Shane presents two papers—"The evolution of monotone quantifiers via iterated learning" and Complexity and learnability in the explanation of semantic universals"—at CogSci (July 2019)

Lewis presents Neural Models of the Psychosemantics of "most" at Cognitive Modeling and Computational Linguistics (CMCL) (June 2019)

Research

Semantic Universals

A major line of recent work has involved explaining semantic universals: shared properties of meaning across all natural languages. We have been using machine learning to argue that the languages of the world express meanings that are easier to learn. Ongoing work extends this approach to more linguistic domains, compares it to other candidate explanations, and integrates learnability with models of language change/evolution.

Representative papers:

Emergent Communication

By placing artificial agents in simulated environments, we can use reinforcement learning to study the emergence of fundamental features of human language. A particular focus has been on non-trivial compositionality: rudimentary forms of hierarchical structure that exhibit one linguistic item modifying another, as in non-intersective adjectives. This research line has both theoretical and practical interest, since autonomous agents such as vehicles may benefit from learning their own communication system.

Representative papers:

Cognitive Science and NLP

We also conduct studies investigating the cognitive underpinnings of language understanding. How do speakers represent meanings internally and how do these representations interface with other cognitive modules?

Moreover, as natural language processing tools become increasingly more powerful, we believe that two-way interaction between cognitive science and NLP will be increasingly important. On the one hand: insights from human language understanding can help us analyze, understand, and build better machines. On the other hand: increasing sophistication in modeling from NLP can provide inspiration and insights in modeling human behavior. Our lab has ongoing projects in both directions.

Representative papers:

People

Principal Investigator

Shane Steinert-Threlkeld

Assistant Professor, Linguistics
Personal Website
shanest AT uw DOT edu

Shane is an Assistant Professor in Linguistics, where he directs the CLMBR group and teaches in the CLMS program. When not researching or teaching, he spends as much time as possible climbing rocks.

Graduate Students

C.M. Downey

PhD Student in Computational Linguistics
Personal Website
cmdowney AT uw DOT edu

C.M. Downey specializes in NLP for cross-lingual and low-resource settings, especially when applied to the task of revitalization for Indigenous and otherwise endangered languages, with current work spanning Machine Translation, improving the zero-shot transferability of modern cross-lingual models, and completely unsupervised segmentation for morphologically-rich or typographically unsegmented languages. Outside of research, Downey enjoys backpacking, cooking, literature, and musical styles from Dvorak to Lorde.

Devin Johnson

Masters Student in Computational Linguistics
Personal Website
dj1121 AT uw DOT edu

Devin is a master's student with research interests in machine learning, language modeling, and computational semantics. Outside of his studies he especially enjoys learning languages, playing music, cooking, and reading philosophy.

Simola Nayak

Masters Student in Computational Linguistics
Linkedin Profile
simnayak AT uw DOT edu

Simola is interested in the cognitive science and emergent communication side of computational linguistics. Her thesis is on the effects of network structure on languages of contact in a multiagent setting. On weekends and evenings, she enjoys hiking, 3D design and printing, and raising houseplants.

Wes Rose

Masters Student in Computational Linguistics
LinkedIn Profile
warose91 AT uw DOT edu

Wes is primarily interested in using interactive computational models to explore the evolution of language and build useful systems. Outside of work and academics he loves to play volleyball and spend time outdoors.

Naomi Tachikawa Shapiro

PhD Student in Computational Linguistics
Personal Website
tsnaomi AT uw DOT edu

Naomi studies how humans and machines process language, drawing on methodologies from psycholinguistics and machine learning. Aside from research, she loves experimenting with art and design, playing piano, and watching too much TV.

Shunjie Wang

Masters Student in Computational Linguistics
LinkedIn Profile
shunjiew AT uw DOT edu

Shunjie's research interests lie in theory and formalisms for NLP, especially the applications of formal language theory in deep learning and theoretical linguistics. When he is not coding, he is interested in the phonology of Sinitic languages and writing systems of languages of China. Outside of the linguistics world, he is a fan of digital typography, subway systems, and air travel.

Meheresh Yeditha

Masters Student in Computational Linguistics
LinkedIn Profile
myeditha AT uw DOT edu

Meheresh is a masters student with research interests in machine learning for big code and natural language, especially semantic parsing and unsupervised machine translation. He's also a software engineer at Microsoft, where he is a data engineer for Microsoft Maps. In his spare time, he enjoys singing, hiking, and collecting/appreciating maps (he really likes maps).

Undergraduate Students

Pengfei He

Applied and Computational Math Sciences

Leroy Wang

Computer Science and Linguistics

Alumni

  • Daniel Campos (PhD student in CS @ UIUC)
    CLMS Thesis: Explorations In Curriculum Learning Methods For Training Language Models
  • Paige Finkelstein (Software Engineer @ turn.io)
    CLMS Thesis: Human-assisted Neural Machine Translation: Harnessing Human Feedback for Machine Translation
  • Benny Longwill
    CLMS Thesis: The Suitability of Generative Adversarial Training for BERTNatural Language Generation
  • Chih-chan Tien
    CLMS Thesis: Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining

Publications

Preprints

Referential and General Calls in Primate Semantics
Shane Steinert-Threlkeld, Philippe Schlenker, Emmanuel Chemla
preprint

Learning Compositional Negation in Populations of Roth-Erev and Neural Agents
Graham Todd, Shane Steinert-Threlkeld, Christopher Potts
preprint

The emergence of monotone quantifiers via iterated learning
Fausto Carcassi, Shane Steinert-Threlkeld, Jakub Szymanik
preprint

2020

Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets
Chuanrong Li, Lin Shengshuo, Zeyu Liu, Xinyi Wu, Xuhui Zhou, and Shane Steinert-Threlkeld, BlackboxNLP
official preprint code

Probing for Multilingual Numerical Understanding in Transformer-Based Language Models
Devin Johnson, Denise Mak, Drew Barker, and Lexi Loessberg-Zahl, BlackboxNLP
official preprint code

Ease of Learning Explains Semantic Universals
Shane Steinert-Threlkeld and Jakub Szymanik, Cognition, vol 195, no. XX, pp. XX.
official preprint code

On the Spontaneous Emergence of Discrete and Compositional Singals
Nur Lan, Emmanuel Chemla, Shane Steinert-Threlkeld, Proceedings of the Association for Computational Linguistics (ACL)
official preprint code

Complexity/informativeness trade-off in the domain of indefinite pronouns
Milica Denic, Shane Steinert-Threlkeld, Jakub Szymanik, Proceedings of Semantics and Linguistic Theory (SALT 30)
preprint code

Semantic Expressivism for Epistemic Modals
Peter Hawke and Shane Steinert-Threlkeld (alphabetical order), Linguistics and Philosophy, forthcoming.
official (open access) preprint

Most, but not more than half is proportion-dependent and sensitive to individual differences
Sonia Ramotowska, Shane Steinert-Threlkeld, Leendert van Maanen, Jakub Szymanik, Proceedings of Sinn und Bedeutung (SuB 24)
preprint

Leading Conversational Search by Suggesting Useful Questions
Corbin Rosset, Chenyan Xiong, Xia Song, Daniel Campos, Nick Craswell, Saurabh Tiwary and Paul Bennett Proceedings of the 26th Annual Meeting of The Web Conference (WWW).
preprint data

Towards the Emergence of Non-trivial Compositionality
Shane Steinert-Threlkeld, Philosophy of Science, forthcoming.
official preprint code

An Explanation of the Veridical Uniformity Universal
Shane Steinert-Threlkeld, Journal of Semantics, vol 37 no 1, pp. 129-144.
official (open access) preprint code

2019

Quantifiers in natural language optimize the simplicity/informativeness trade-off
Shane Steinert-Threlkeld, Proceedings of the 22nd Amsterdam Colloquium, eds. Julian J. Schlöder, Dean McHugh & Floris Roelofsen, pp. 513-522.
official preprint code poster

Learnability and Semantic Universals
Shane Steinert-Threlkeld and Jakub Szymanik, Semantics & Pragmatics, vol 12 issue 4.
early access code

The emergence of monotone quantifiers via iterated learning
Fausto Carcassi, Shane Steinert-Threlkeld (co-first), and Jakub Szymanik, Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019).
preprint code

Complexity and learnability in the explanation of semantic universals
Iris van de Pol, Shane Steinert-Threlkeld, and Jakub Szymanik, Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019).
preprint code

Neural Models of the Psychosemantics of "Most"
Lewis O'Sullivan and Shane Steinert-Threlkeld, Proceedings of the 9th Workshop on Cognitive Modeling and Computational Linguistics (CMCL2019).
official poster code

2018

Paying Attention to Function Words
Shane Steinert-Threlkeld, Emergent Communication Workshop @ 32nd Conference on Neural Information Processing Systems (NeurIPS 2018).
paper poster code

Some of them can Be Guessed! Exploring the Effect of Linguistic Context in Predicting Quantifiers
Sandro Pezzelle, Shane Steinert-Threlkeld, Raffaella Bernardi, Jakub Szymanik, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018).
official code

Informational Dynamics of Epistemic Possibility Modals
Peter Hawke and Shane Steinert-Threlkeld, Synthese, vol 195 no 10, pp. 4309-4342.
official

2016

Compositional Signaling in a Complex World
Shane Steinert-Threlkeld, Journal of Logic, Language, and Information, vol 25 no 3, pp. 379-397.
official code

Compositionality and Competition in Monkey Alert Calls
Shane Steinert-Threlkeld, Theoretical Linguistics, vol 42 no 1-2, pp. 159-171.
official local