This is a feature that I personally wanted for a long time. Interrupted by FOSDEM, some Wayland research and many other things, I finally managed to get word prediction and error correctionbeyond prototype quality. The video shows just how amazingly good the Presage word prediction can be, even without extensive training (in fact, for the video we used the minimal language model training that comes with a regular Presage installation). The second part of the video shows how combining Presage with a spellchecker such as Hunspell further improves the provided word candidates.
Presage uses a very scalable approach called text n-grams. There is a lot of research in that area, but language models of contemporary language usage are either well guarded or cannot be freely distributed. Luckily, Presage comes with training tools such as text2ngram. Users can feed arbitrary language corpora to it, though one should be careful to perhaps not mix different languages too much.
Matteo Vescovi, the author of Presage (formerly known as Soothsayer), started the work as part of his master thesis a couple of years ago. The heart of Presage are the different predictors. They can be queried in parallel and the result lists are merged, using probability analysis.
Presage certainly has a lot of potential. It comes with an easy to use C++ API but also provides bindings for C and Python. In fact, it even provides a D-Bus API, which would make it possible to run it as a system service. The user could then benefit from (and train!) the same language models from different applications.
Hunspell probably doesn’t need much introduction. It is used in many Linux desktops. It’s a fine library and comes with many dictionaries. However, one should be aware that Hunspell itself cannot provide word prediction, which is why it wasn’t enough for mobile text input. As a fallback for Presage, it works very well though.
Of course there are a couple of things we could do from here. As Jon mentioned in the video, the virtual keyboard’s word ribbon UI could host word suggestions from other applications, such as the Google search in the browser. For Unity’s dasher input or Gnome Shell’s search, the application names could be shown instead. Or we could hook it up to Bash completion.
PS: Anyone up to package Presage for Fremantle or Nemo? It could be pretty interesting to see the next release of Maliit running on the N900.
Source Thoughts ‘n Stuff