Using a parsed corpus for linguistic research: A case study on the Coordinate Structure Constraint in Japanese
Yusuke Kubota, Ai Kubota
March 2018

This paper presents a case study of using the NINJAL Parsed Corpus of Modern Japanese (NPCMJ) ( for theoretical linguistics research. NPCMJ is the first phrase structure-based treebank for Japanese that is specifically designed for application to linguistic (in addition to NLP) research. After discussing some basic methodological issues pertaining to the use of treebanks for theoretical linguistics research, we introduce our case study of the Coordinate Structure Constraint (CSC) in Japanese, showing that NPCMJ enables us to easily retrieve examples that support the key theoretical claim of Kubota and Lee (2015). The corpus-based study we conducted moreover revealed a previously unnoticed tendency that was highly relevant for further clarifying the nature of the CSC. We conclude the paper by briefly discussing some further methodological issues brought up by our case study pertaining to the relationship between linguistic research and corpus development.
keywords: coordinate structure constraint, japanese, treebank, discourse relation, coherence, corpus, semantics, syntax
