Descriptions

Words, Constructions and Corpora: Network Representations of Constructional Semantics for Mandarin Space Particles

Abstract of the Study

In this study, we aim to demonstrate the effectiveness of network science in exploring the emergence of constructional semantics from the connectedness and relationships between linguistic units. With Mandarin locative constructions (MLCs) as a case study, we extracted constructional tokens from a representative corpus, including their respective space particles (SPs) and the head nouns of the landmarks (LMs), which constitute the nodes of the network. We computed edges based on the lexical similarities of word embeddings learned from large text corpora and the SP-LM contingency from collostructional analysis. We address three issues: (1) For each LM, how prototypical is it of the meaning of the SP? (2) For each SP, how semantically cohesive are its LM exemplars? (3) What are the emerging semantic fields from the constructional network of MLCs? We address these questions by examining the quantitative properties of the network at three levels: microscopic (i.e., node centrality and local clustering coefficient), mesoscopic (i.e., community), and macroscopic properties (i.e., small-worldness and scale-free). Our network analyses bring to the foreground the importance of repeated language experiences in the shaping and entrenchment of linguistic knowledge.

Keywords: Usage-based grammar, collocation, collostruction analysis, network analysis, space particles, construction grammar

About this Supplementary Data

This HTML file provides the networks of the Mandarin Locative Constructions in a dynmatic format.
The MLC Network tab includes the complete network of all nodes and edges discussed in the study.
The sub-networks of each space particle with their detected communities are provided in the Sub-Community Networks tab.

Networks Functionalities

Readers can left-click and drag to navigate the graph, or zoom in and out the networks for clearer inspection of particular sections of the graph.
Each node is provided with the Chinese pinyin and English gloss of the word. This information will pop up when the mouse hovers the nodes.
Readers can highlight nearest neighboring nodes and edges by clicking on a specific node. Please click everywhere except on nodes to reset the network.
All LM node sizes are proportional to the centrality metrics used in the study (i.e., the PageRank centrality).
Construction (i.e., space particles/locatives) and lexemes are presented in different colors in the graphs.
Some networks may need a couple of seconds to stablize. If the network does not show up, please give it a few more seconds.
In the Sub-Community Networks, readers may use the drop-down menu to highlight particular communities in the sub-network. Communities with alphebet labels (e.g., A, B, C etc.) are the ones discussed in the main study. Sparse or one-node communities are marked as “NA” in the drop-down menu.
All these graphs are created using the library visNetwork in R.

Acknowledgements

This research was supported by Taiwan Ministry of Science and Technology (104-2410-H-003-134 and 108-2410-H-003-023-MY2). Correspondence should be addressed to Alvin Cheng-Hsien Chen, alvinchen@ntnu.edu.tw.