Google Labs has come out with a new tool that it is calling “Like Google Trends in reverse.” Google Correlate allows users to enter a data series and get back queries that follow a similar pattern. Correlate is based off the technology that Google used to create Google Flu Trends.
When you enter a data set into Correlate, it uses the Pearson Correlation Coefficient – a principle of statistics regarding data sets – to show the highest related coefficient within the search term. Correlate data can be input from either a spreadsheet or by exporting a CSV. Correlate also has pre-existing data sets from locations like states.
Like Google Flu Trends, Correlate is interesting in how it searches statistical relationships within a time set. For instance, within a data set you have coefficients that correspond to summer or winter solstice. See this example from Google Labs.
Google is trying to position itself as a leader in big data and Correlate is one of the first tools it has launched to gain added value from data sets. Sean Power co-author of Complete Web Monitoring (O”Reilly 2009) weighed in on what Correlate will mean going forward.
“Correlate, We’re going to see this type of announcements and many more coming from Google in the upcoming quarters,” Power said. “Just like Amazon wants to own the infrastructure stack of the world to feed intelligence into their business, Google must own the Analytics stack to augment their own core. With major updates to Trends and Analytics, and unsung tools like BigQuery, Google is poised to be a major player in the data space.”
Think of Correlate as an inverse of a normal search. Say you are searching Google Trends for “mittens” (that is the example that Google uses in the Correlate tutorial). You would input “mittens” and feed it through a defined time set, a couple of years for instance. Correlate would be the opposite. You enter the data set for a time series and Correlate will extract information through the algorithm and tie the relationship to Google search results. It is not a matter of searching a data set but finding the relationships of that data set and time series to a larger search quotient.
Correlate can do positive as well as negative correlations over data sets. It can be a tool to create data models or find key trends in a location over a period of time. Correlate filters queries with low correlation values, misspelled words, pornographic searches, rare queries and those within a small portion of a time series.
In terms of privacy, you may be hesitant to hand Google a big chunk of data that is necessary to use Correlate as an effective tool. Google relies on its millions of search queries to observe patterns across a large population. Google says “we are keenly aware of the trust our users place in us, and of our responsibility to protect their privacy. Google Correlate can never be used to identify individual users because we rely on anonymized, aggregated counts of how often certain search queries occur each week.”