by Nick Kolakowski | March 6, 2013
Facebook’s new, powerful search platform required a lot of engineering work, and a system known as Unicorn.
Facebook’s Graph Search needed to handle constant updates to the social network’s entities and relationships.
Facebook’s Graph Search, its new and powerful way of searching the social network for all manner of information, has drawn a lot of attention since its January unveiling. Some have praised its innovation; others have wondered openly whether its search abilities will end up threatening Google and LinkedIn. Still more have questioned what it all means for users’ privacy—always a touchy subject in conjunction with Facebook.
On its most basic level, Graph Search allows users to input natural-language queries into the social network’s search bar and receive incredibly granular data from Facebook in return. Unlike Bing or Google, it won’t tell you the answer to a complicated math problem or how to best navigate between Times Square and Wall Street; but if you type “Photos of friends in San Francisco who like sushi,” chances are you’ll get a list of people back.
Facebook previously revealed how it’s adjusting its hardware infrastructure to deal with the spike in traffic that will come from interactions with Graph Search (short answer: the Disaggregated Rack, which will break up hardware resources and scale them independently of one another). Now, in a new blog posting, it’s offering a bit more with regard to the software side of things, and how the company repurposed an existing system to solve Graph Search’s enormous engineering challenge.
In the beginning, Facebook relied on an older search platform, PPS, based on keywords. By 2009, the social network’s engineers had begun tinkering with a new-and-improved search, known as Typeahead, which was capable of delivering search results as the user typed words into the search bar—for example, inputting “Richard” would draw up all the people named Richard from your contacts and contacts-of-contacts, even before you inputted a last name. Typeahead was launched the next year, after a total reimplementation of backend and frontend, and much fussing with algorithms for better performance (PPS also remained a factor).
Facebook also had other tools with some sort of search functionality built in, including tagging the location of posts and photos. “In order to make Graph Search work, and return high-quality results,” read a March 6 note on Facebook’s corporate blog, “we needed to create an index that would support all of these systems and allow for the richer queries of Graph Search.”
While PPS and Typeahead rely on metadata as the foundation of search, Facebook needed something entirely different with Graph Search: a capability to search the connections between entities such as people and places, or applications and events. It would also need to support natural language inputs, so anyone could type in something like, “Restaurants in New York City liked by friends of friends” and receive something useful back.
Facebook’s engineers and executives finally decided on Unicorn, an inverted-index system they’d had in development for quite some time. Because it could serve as an in-memory database for quickly looking up entities based on various attributes, Unicorn formed the backend of many internal Facebook initiatives; that also made it perfect for what the company was trying to achieve with what would eventually become Graph Search.
Unicorn included an index that mapped entities (fbids) to attributes (strings), along with a framework for building the index from persistent data and updates; it also featured a framework for retrieving entities based on particular constraints on attributes.
“Unicorn is not only the software that holds our index, but the system that builds the index and the system that retrieves from the index,” Facebook’s posting added. “Unicorn (like all inverted indices) optimizes retrieval performance by spending time during indexing to build optimal retrieval data structures in memory.”
Building those data structures required “a combination of map-reduce scripts that collect data from Hive tables, process them, and convert them into the inverted index data structures.” Facebook also had to deal with the always-shifting nature of its network, with billions of pieces of content added every day (along with billions of things “Liked”). But after all that work, the social network is now backed by a unified system, one in which users can interact with the graph in whole new ways.