Metadata-Version: 1.1
Name: zhon
Version: 1.1.5
Summary: Zhon provides constants used in Chinese text processing.
Home-page: https://github.com/tsroten/zhon
Author: Thomas Roten
Author-email: thomas@roten.us
License: UNKNOWN
Description: ====
        Zhon
        ====
        
        .. image:: https://badge.fury.io/py/zhon.png
            :target: http://badge.fury.io/py/zhon
        
        .. image:: https://travis-ci.org/tsroten/zhon.png?branch=develop
                :target: https://travis-ci.org/tsroten/zhon
        
        Zhon is a Python library that provides constants commonly used in Chinese text
        processing.
        
        * Documentation: http://zhon.rtfd.org
        * GitHub: https://github.com/tsroten/zhon
        * Support: https://github.com/tsroten/zhon/issues
        * Free software: `MIT license <http://opensource.org/licenses/MIT>`_
        
        About
        -----
        
        Zhon's constants can be used in Chinese text processing, for example:
        
        * Find CJK characters in a string:
        
          .. code:: python
        
            >>> re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.')
            ['我', '打', '破', '了', '一', '个', '盘', '子']
        
        * Validate Pinyin syllables, words, or sentences:
        
          .. code:: python
        
            >>> re.findall(zhon.pinyin.syllable, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
            ['Yuàn', 'zi', 'lǐ', 'tíng', 'zhe', 'yí', 'liàng', 'chē']
        
            >>> re.findall(zhon.pinyin.word, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
            ['Yuànzi', 'lǐ', 'tíngzhe', 'yí', 'liàng', 'chē']
        
            >>> re.findall(zhon.pinyin.sentence, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
            ['Yuànzi lǐ tíngzhe yí liàng chē.']
        
        Features
        --------
        
        + Includes commonly-used constants:
            - CJK characters and radicals
            - Chinese punctuation marks
            - Chinese sentence regular expression pattern
            - Pinyin vowels, consonants, lowercase, uppercase, and punctuation
            - Pinyin syllable, word, and sentence regular expression patterns
            - Zhuyin characters and marks
            - Zhuyin syllable regular expression pattern
            - CC-CEDICT characters
        + Runs on Python 2.7 and 3
        
        Getting Started
        ---------------
        
        * `Install Zhon <http://zhon.readthedocs.org/en/latest/#installation>`_
        * Read `Zhon's introduction <http://zhon.readthedocs.org/en/latest/#using-zhon>`_
        * Learn from the `API documentation <http://zhon.readthedocs.org/en/latest/#zhon-hanzi>`_
        * `Contribute <https://github.com/tsroten/zhon/blob/develop/CONTRIBUTING.rst>`_ documentation, code, or feedback
        
Keywords: chinese mandarin segmentation tokenization punctuation hanzi unicode radicals han cjk cedict cc-cedict traditional simplified characters pinyin zhuyin
Platform: Any
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
