Home Big Data, Better Dating: How OKCupid Helps Users Find Their Perfect Match

Big Data, Better Dating: How OKCupid Helps Users Find Their Perfect Match

Online dating services promise you help finding your “perfect match.” Complete your profile, answer a series of questions, choose a few filters as you search and you’re given a list of singles in your area who are most likely to be a suitable date.

But providing these search results is no easy task. It isn’t simply a matter of identifying the right people based on a single user’s dating criteria. The people whose profiles are returned in the results should also, in turn, “like” the person who’s searching. In other words, the matching has to occur at both ends.

Today at the opening keynote of OSCON‘s new Data sub-conference, OKCupid‘s CTO Tom Quisel spoke about how the online dating service has built its architecture in order to handle these queries. As Quisel notes, the types of searches that OKCupid users conduct are different than those done via other search engines. After all, “Web pages don’t have personal preferences.”

OKCupid is well known for its data analysis and for releasing trends and insights that it’s gleaned from user profiles. With over 7 million active users, indeed, there is a lot of data to be had. On average, says Quisel, active users have answered about 3000 questions; they’ve hidden the profiles of several thousand users they aren’t interested in; they’ve voted for about 4000 profiles. All that data is in addition to users’ personal demographic data and preferences, as well as their site usage information (how often they log in, how often they respond to messages and so on).

And all that data makes a simple search for a list of potential matches quite complex. In fact, says Quisel, it can take 13 billion seeks in order to load one page of results.

OKCupid’s Technical Architecture

The challenge for OKCupid then is has been to build a system that is scalable, fast and reliable, but also low cost. In his talk today, Quisel detailed the distributed architecture that OKCupid utilizes. Interestingly, OKCupid has made the decision to utilize C++, as Quisel argues that it’s three times faster and uses four times less memory – as well as fewer support staff – than Java. OKCupid also primarily uses MySQL.

Users’ data is split across workers, says Quisel, and OKCupid uses a quadtree structure in order to split up the data. As one of the most important preferences for would-be daters is location, that’s the first filter utilized. Then, for each quadtree leaf node, the vector is sorted by last login, so that only recent visitors and active users are returned in search results.

Quisel also says that OKCupid utilizes SSDs – and consumer-grade SSDs, notably – but the company has done extensive SSD benchmarking to make the process efficient and reliable. In order to avoid problems with reads and writes, Quisel says that OKCupid has taken SSDs off “the most critical paths.”

More from OSCON…

Not able to make it to OSCON this year? ReadWriteWeb will be reporting from the conference all week, but you can also watch the livestream from OSCON here.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.