|
1 | | -[](https://travis-ci.org/OpenGov/cython_hunspell) |
2 | | - |
3 | 1 | # CyHunspell |
4 | 2 | Cython wrapper on Hunspell Dictionary |
5 | 3 |
|
6 | | -## Description |
7 | | -This repository provides a wrapper on Hunspell to be used natively in Python. The |
8 | | -module uses cython to link between the C++ and Python code, with some additional |
9 | | -features. There's very little Python overhead as all the heavy lifting is done |
10 | | -on the C++ side of the module interface, which gives optimal performance. |
11 | | - |
12 | | -The hunspell library will cache any corrections, you can use persistent caching by |
13 | | -adding the `use_disk_cache` argument to a Hunspell constructor. Otherwise it uses |
14 | | -in-memory caching. |
15 | | - |
16 | | -## Dependencies |
17 | | -cacheman -- for (optionally asynchronous) persistent caching |
18 | | - |
19 | | -## Features |
20 | | -Spell checking & spell suggestions |
21 | | -* See http://hunspell.sourceforge.net/ |
22 | | - |
23 | | -## How to use |
24 | | -Below are some simple examples for how to use the repository. |
25 | | - |
26 | | -### Creating a Hunspell object |
27 | | - from hunspell import Hunspell |
28 | | - h = Hunspell(); |
29 | | - |
30 | | -You now have a usable hunspell object that can make basic queries for you. |
31 | | - |
32 | | - h.spell('test') # True |
33 | | - |
34 | | -### Spelling |
35 | | -It's a simple task to ask if a particular word is in the dictionary. |
36 | | - |
37 | | - h.spell('correct') # True |
38 | | - h.spell('incorect') # False |
39 | | - |
40 | | -This will only ever return True or False, and won't give suggestions about why it |
41 | | -might be wrong. It also depends on your choice of dictionary. |
42 | | - |
43 | | -### Suggestions |
44 | | -If you want to get a suggestion from Hunspell, it can provide a corrected label |
45 | | -given a basestring input. |
46 | | - |
47 | | - h.suggest('incorect') # (u'incorrect', u'correction', u'corrector', u'correct', u'injector') |
48 | | - |
49 | | -The suggestions are in sorted order, where the lower the index the closer to the |
50 | | -input string. |
51 | | - |
52 | | -### Stemming |
53 | | -The module can also stem words, providing the stems for pluralization and other |
54 | | -inflections. |
55 | | - |
56 | | - h.stem('testers') # (u'tester', u'test') |
57 | | - h.stem('saves') # (u'save',) |
58 | | - |
59 | | -### Bulk Requests |
60 | | -You can also request bulk actions against Hunspell. This will trigger a threaded |
61 | | -(without a gil) request to perform the action requested. Currently just 'suggest' |
62 | | -and 'stem' are bulk requestable. |
63 | | - |
64 | | - h.bulk_suggest(['correct', 'incorect']) |
65 | | - # {'incorect': (u'incorrect', u'correction', u'corrector', u'correct', u'injector'), 'correct': ['correct']} |
66 | | - h.bulk_stem(['stems', 'currencies']) |
67 | | - # {'currencies': [u'currency'], 'stems': [u'stem']} |
68 | | - |
69 | | -By default it spawns number of CPUs threads to perform the operation. You can |
70 | | -overwrite the concurrency as well. |
71 | | - |
72 | | - h.set_concurrency(4) # Four threads will now be used for bulk requests |
73 | | - |
74 | | -### Dictionaries |
75 | | -You can also specify the language or dictionary you wish to use. |
76 | | - |
77 | | - h = Hunspell('en_CA') # Canadian English |
78 | | - |
79 | | -By default you have the following dictionaries available |
80 | | -* en_AU |
81 | | -* en_CA |
82 | | -* en_GB |
83 | | -* en_NZ |
84 | | -* en_US |
85 | | -* en_ZA |
86 | | - |
87 | | -However you can download your own and point Hunspell to your custom dictionaries. |
88 | | - |
89 | | - h = Hunspell('en_GB-large', hunspell_data_dir='/custom/dicts/dir') |
90 | | - |
91 | | -### Asynchronous Caching |
92 | | -If you want to have Hunspell cache suggestions and stems you can pass it a directory |
93 | | -to house such caches. |
94 | | - |
95 | | - h = Hunspell(disk_cache_dir='/tmp/hunspell/cache/dir') |
96 | | - |
97 | | -This will save all suggestion and stem requests periodically and in the background. |
98 | | -The cache will fork after a number of new requests over particular time ranges and |
99 | | -save the cache contents while the rest of the program continues onward. You'll never |
100 | | -have to explicitly save your caches to disk, but you can if you so choose. |
101 | | - |
102 | | - h.save_cache() |
103 | | - |
104 | | -Otherwise the Hunspell object will cache such requests locally in memory and not |
105 | | -persist that memory. |
106 | | - |
107 | | -## Platforms |
108 | | -### Linux |
109 | | -Tested on Ubuntu and Fedora with pre-build binaries of Hunspell as well as |
110 | | -automatically build depedencies. It's inlikely to have trouble with other |
111 | | -distributions. |
112 | | - |
113 | | -### Windows |
114 | | -The base library comes with MSVC built Hunspell libraries and will link |
115 | | -against those during runtime. These were tested on Windows 7, 8, 10 and |
116 | | -some on older systems. It's possible that a Python build with a newer |
117 | | -(or much older) version of MSVC will fail to load these pre-built libraries. |
118 | | - |
119 | | -### Mac OSX |
120 | | -So far the library has been tested against 10.9 (Mavericks) and up. There |
121 | | -shoudn't be any reason it would fail to run on any particular version of |
122 | | -OSX. |
123 | | - |
124 | | -## Building source libraries |
125 | | -See libs/README |
126 | | - |
127 | | -## Navigating the Repo |
128 | | -### hunspell |
129 | | -Package wrapper for the repo. |
130 | | - |
131 | | -### tests |
132 | | -All unit tests for the repo. |
133 | | - |
134 | | -## Language Preferences |
135 | | -* Google Style Guide |
136 | | -* Object Oriented (with a few exceptions) |
137 | | - |
138 | | -## TODO |
139 | | -* Convert cacheman dependency to be optional |
140 | | - |
141 | | -## Known Issues |
142 | | -- Exact spelling suggestions on different OS's differs slightly with identical |
143 | | -inputs. This appears to be an issue with Hunspell 1.3 and not this library. |
144 | | -- Older versions of pip and setuptools will build with incorrect windows DLL bindings |
145 | | -and complain about "ImportError: DLL load failed: %1 is not a valid Win32 application." |
146 | | -- Sometimes windows machines won't find the build tools appropiately. You may need |
147 | | -to 'SET VS100COMNTOOLS=%VSxxxCOMNTOOLS%' before installing. Python 3 usually wants the |
148 | | -xxx as '140' and python 2 as '90'. There's not a lot the library can do to fix this, |
149 | | -though pip and setuptools upgrades oftentimes resolve the issue by being smarter. |
150 | | - |
151 | | -## Author |
152 | | -Author(s): Tim Rodriguez and Matthew Seal |
153 | | - |
154 | | -## License |
155 | | -MIT |
156 | | - |
157 | | -© Copyright 2015, [OpenGov](http://opengov.com) |
| 4 | +**This project is now maintained on the [MSeal Fork](https://github.com/MSeal/cython_hunspell)** |
0 commit comments