Ever wanted to know the language of a Web site? Or whether the text within it is considered spam? Well, it’s a lot easier since the launch of uClassify, the free Web service and API out of Sweden that lets you create and train your own text classifiers.
According to uClassify’s about page, a text classifier answers the question: “To which predefined category is this text most likely to belong?” Text classifiers can be used to create spam filters, categorize Web pages, detect languages, classify a batch of blog posts, and more.
How uClassify Works
While there are different types of classifiers, uClassify is a machine learning classifier meaning you need to train it before it can start to classify documents.
Training it is simple enough. You manually set up two or more classes, for instance spam and legitimate, and then manually attach documents (known as the training corpus) to the class they belong to. This supervised training helps the classifier understand the characteristics of various classes.
Once trained, the classifier will determine which of the predefined classes a previously unseen document is most likely to belong to and return a percentage based answer. While you can continue training uClassify once you start classifying, the longer you spend training uClassify, the more accurate the results will be.
uClassify now has a “click ‘n’ classify” GUI so you don’t need programming skills to create classifers. All you need to do is create an account, log in and it will walk you through the three step process.
Image: Copy and paste text or enter a URL to train uClassify
To date, users have created over 200 classifiers. Three fun sites include:
Typealyzer: Classifies blog personality using a psychological text analysis.
Genderanalyzer: Decides if a page is written by a man or woman
oFaust: Determines which classical author your text is most like
Started by Jon Kagstrom in 2004 uClassify was formed to share classifier technology with the masses.