What do a Russian math professor, a Harvard neurobiologist, a French actuary and British finance quant all have in common? They all were recently identified as some of the top 10 Kaggle data scientists.
Each received the designation as part of their efforts in developing some of the best solutions to the website’s crowdsourcing analytics competitions. Learn why three of them participate in Kaggle, and how they became the alpha data geeks that they are:
- Tim Salimans, a 26-year-old Ph.D. candidate in econometrics at Erasmus University Rotterdam in the Netherlands,
- David Slate, an older computer programmer from the Chicago area, and
- Jason Tigg, a 43-year-old with a Ph.D. in elementary particle physics from Oxford who is based in London, where he works trading statistical arbitrage in finance.
Salimans, who runs and plays a number of competitive sports, finds that “It’s mostly the competitive element of Kaggle that motivates me. I just like to be challenged this way.” The online leaderboard is another way. “The direct feedback it provides is quite unique in the area of data analysis and gives you a lot of motivation.”
But it helps to have some fame, too. After he won his first competition (a chess rating challenge), he was contacted by Thore Graepel of Microsoft Research, and ended up interning with him. But Kaggle also shortcuts the traditional academic review process to publish his work: “Publishing an academic article is a very slow and tedious process that commonly takes over a year in my field, while the descriptions of my winning entries in the Kaggle competitions get read by a similar number of people and only take an hour to write.”
Another top 10 winner is David Slate. He has been a computer programmer for nearly 50 years after getting degrees in physics. He has been doing predictive analytics for several decades and is retired now. His team at Northwestern University won the World Computer Chess Championship from 1977 to 1980. He developed a credit-card fraud detection system that is still in commercial use. Most of his contests have been jointly entered with Peter Frey under the team name “Old Dogs With New Tricks.”
“Every contest is fun and has interesting data. I like to apply my skills to solve some real problems and especially in the medical area.” Slate is in his 60s, which he touts as an advantage. “We can bring an impressive amount of geezer power to bear on the problem,” he told me. “We have also developed our own software tools for predictive analytics, too.”
It also helps to be persistent because “there is a lot of trial and error, and the contests require a fair amount of time to spend on them.” Slate mentions that he often tweaks his algorithms daily, trying new tactics. It certainly helps not having a day job to distract him from his contests!
Kaggle has been around for two years now and has had more than 33,000 participants from around the world. Competitions may have cash prizes attached to them, or can be used by college students as part of an in-class homework assignment. We have written about them before doing some very innovative things. Naming their top 10 scientists just seems so appropriate, given how they instantly track the leading entries to all of their contests.
Back when I was in my graduate statistics classes, I had no idea that the world of data science could be the wonderful and exciting place that it is now. In that era, we were slaves to problem sets, basically an upgrade to fifth-grade arithmetic homework assignments where you got a problem and had to show your work toward the solution. Can you say boring? It is no wonder that even Barbie thinks math is too tough.
But thanks to Kaggle in Class, students around the world have the opportunity to make math more fun, or at least more socially engaging. Salimans told me that he “first used Kaggle in Class last year, and I have never seen the students so enthusiastic about a class assignment. A lot of them worked on it for two weeks straight up to the deadline, while I had had trouble motivating them for some of the earlier assignments. An in-class competition is also great at getting the students to develop some real practical understanding of the different methods, in a way that most computer assignments fail to do.”
Jason Tigg, meahwhile, started doing assembly language programming as a teen, building a program to play Othello. He has done well on several Kaggle contests, including Photo Quality Prediction competition and the Claim Prediction Challenge.
“My two biggest motivations are fun and learning,” he said. “I feel lucky to be living through this chapter in history where machine intelligence is ramping up so rapidly. I feel a buzz around the area, which I imagine was how physics felt around the turn of the last century. People are trying out new ideas, and no one knows for sure where we will all end up.” He has entered a variety of competitions, with the goal of increasing his knowledge about new machine-learning techniques. That said, he looks at the leaderboard because it is “extremely useful for judging how much you are missing, and how much you need to learn.”
Tigg also busted the myth about how much computing power you need to solve the contest’s problems, “Do not worry about needing huge amounts of compute power, it is possible to do well in these competitions with very cheap setups.”
So good work to everyone who has entered Kaggle and other data science contests. Hopefully you can find inspiration from these three who have risen to the top!
Image courtesy of Shutterstock.com