CLP 2014 Bake-off: Chinese Spelling Check Task


Liang-Chih Yu , Yuan-Ze University
Lung-Hao Lee, National Taiwan (Normal) University
Yuen-Hsien Tseng, National Taiwn Normal University

Hsin-Hsi Chen, National Taiwn University


We introduce a Chinese Spelling Check campaign organized for the SIGHAN 2014 bake-off, including task description, data preparation, performance metrics, and evaluation results based on essays written by Chinese as a foreign language learners. The hope is that such evaluations can produce more advanced Chinese spelling check techniques.

Overview Paper:

Liang-Chih Yu, Lung-Hao Lee, Yuen-Hsien Tseng, and Hsin-Hsi Chen (2014). Overview of SIGHAN 2014 Bake-off for Chinese Spelling Check. Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP'14), Wuhan, China, 20-21 October, 2014, pp. 126-132.

Data Release:

The data sets with gold standard annotation and the evaluation tool can be downloaded. Please comply the following aggrements.


The undersigned party has been authorized to use CLP 2014 CSC Datasets for research purposes. The undersigned party agrees to abide with the following conditions on the use of these data sets:

  1. The CLP 2014 CSC Datasets can only be used in academic research and cannot be used in profit-generating or commercial activities.

  2. The undersigned party will not transfer all or any part of CLP 2014 CSC Datasets to third party.

  3. The undersigned party will indicate the uses of CLP 2014 CSC Datasets, and acknowlege in any papers or reporting results of academic research based on the CLP 2014 CSC Datasets.

    Please cite the papers as references for using the datasets:

    Liang-Chih Yu, Lung-Hao Lee, Yuen-Hsien Tseng, and Hsin-Hsi Chen (2014). Overview of SIGHAN 2014 Bake-off for Chinese Spelling Check. Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP'14), Wuhan, China, 20-21 October, 2014, pp. 126-132.

  4. The undersigned party alone bears the legal responsibility for any possible infingement of copyrights or intellectual property rights that may arise in the process of using the CLP 2014 CSC Datasets for profit-making