You read “Beautiful Data” and highlighted parts to share with your data geek cohorts. You log everything. Deep down you know there will be metrics in this growing pile.
But have you ever stepped back to reflect on the growing piles of data and wondered how to begin to slice and dice it? You might want to find a data scientist.
Let’s take a look at a recent blog entry that caught RWH’s eye.
Revel in your O’s and S’s
In the recent blog post A Taxonomy of Data Science the notion of where “hack” fits is presented as being part of a larger mix of areas of interest. Yes. You had us at “hack”.
In their dataists post, Hilary Mason and Chris Wiggins explore what it takes to be a data scientist. By reducing the areas of pursuit for the data scientist you come away with a better understanding of what type of person you might want in your future projects. With this approach they make the concepts around being a data scientist an accessible and interesting read. Go there now. We’ll wait.
In a kind of funny way (geeky hack funny ha ha way) the comments on the post are a reflection of how the data scientist can easily be misunderstood. It (probably) isn’t a coincidence that bit.ly links are in abundant use throughout. Mad props to @hmason for maximizing data potential!
In a very humble attempt to create a backronym, I’d only propose adding “Prioritize” to the front of OSEMN compiled by Mason and Wiggins. i.e. Prioritize, Obtain, Scrub, Explore, Model, iNterpret.
Prioritize: The world is a very large place and the data generated is only getting more diverse. By the way, you might be running a business.
If you are a startup company, there are probably metrics of immediate interest tied to the business. What’s always a dicey proposition is saying you can monetize your marketing data. If only everyone thought your data was unique and beautiful like a snowflake! So, perhaps a priority or weighting has to be applied to the value of specific data sets to determine what is worthy of first pass, second pass, and deeper thinking. This is the business hat of being a data scientist that says what to look at first and where to apply resources that will materially improve the position of the company.
Barring steroid scandal, it’s a good bet that data science hall of fame candidates will emerge from academia or those with a heavy commercially applied statistics background. Also, based on conversations with team members at Hadoop, Pig, etc. shops within some very large Internet companies, the new origins of the data scientist might even be someone that has acted as a systems engineer supporting rigorous the discipline oriented individuals. Here’s why: being a data scientist is (increasingly) about being well rounded.
If you happen to be a data scientist or if you work with them, what would you say is the profile of a data scientist? Let us know in the comments below!