Pattern is a collection of open source (BSD license) web mining modules for Python from the Computational Linguistics and Psycholinguistics Research Center. It contains tools for data retrieval, text analysis and data visualization and comes with over 30 sample scripts.
Pattern consists of six main modules:
- pattern.web: A toolkit that includes APIs for various Web services, including Google, Gmail, Bing, Twitter Wikipedia and Flickr. It has its own HTML parser and Web spider.
- pattern.table: A module for working with tabular data, used for storing data from the pattern.web module.
- pattern.en: A natural language processing toolkit for English.pattern.search: A module containing a search algorithm.
- pattern.vector: A module containing various tools for analyzing the text of a document.
- pattern.graph: A module for data visualization using Canvas.