SIGHAN 2015 Bake-off: Chinese Spelling Check Task

Co-organizers:

Yuen-Hsien Tseng, National Taiwan Normal University
Lung-Hao Lee, National Taiwan Normal University
Li-Ping Chang, National Taiwan Normal University
Hsin-Hsi Chen, National Taiwan University

Introduction:

This paper introduces the SIGHAN 2015 Bake-off for Chinese Spelling Check, including task description, data preparation, performance metrics, and evaluation results. The competition reveals current state-of-the-art NLP techniques in dealing with Chinese spelling checking. All data sets with gold standards and evaluation tool used in this bake-off are publicly available for future research.

Overview Paper:

Yuen-Hsien Tseng, Lung-Hao Lee, Li-Ping Chang, and Hsin-Hsi Chen (2015). Introduction to SIGHAN 2015 Bake-off for Chinese Spelling Check. Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (SIGHAN'15), Beijing, China, 30-31 July, 2015, pp. 32-37.

Data Release:

The data sets with gold standard annotation and the evaluation tool can be downloaded. Please comply the following aggrements.

Agreements

The undersigned party has been authorized to use SIGHAN 2015 CSC Datasets for research purposes. The undersigned party agrees to abide with the following conditions on the use of these data sets:

  1. The SIGHAN 2015 CSC Datasets can only be used in academic research and cannot be used in profit-generating or commercial activities.

  2. The undersigned party will not transfer all or any part of SIGHAN 2015 CSC Datasets to third party.

  3. The undersigned party will indicate the uses of SIGHAN 2015 CSC Datasets, and acknowlege in any papers or reporting results of academic research based on the SIGHAN 2015 CSC Datasets.

    Please cite the papers as references for using the datasets:

    Yuen-Hsien Tseng, Lung-Hao Lee, Li-Ping Chang, and Hsin-Hsi Chen (2015). Introduction to SIGHAN 2015 Bake-off for Chinese Spelling Check. Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (SIGHAN'15), Beijing, China, 30-31 July, 2015, pp. 32-37.

  4. The undersigned party alone bears the legal responsibility for any possible infingement of copyrights or intellectual property rights that may arise in the process of using the SIGHAN 2015 CSC Datasets for profit-making