• Home   /  
  • Archive by category "1"

Big Data Case Study Facebook

If Facebook were a country, it would be the most populous nation on earth. Running in its 11th year of success, Facebook stands today as one of the most popular social networking sites, comprising of 1.59 billion accounts, which is approximately the 1/5th of the world's total population. Launched in the year 2004, it has grown tremendously since then. With a sudden obsession in social media, the number of people on Facebook has increased enormously, producing a massive amount of data every minute.

Every time you open your Facebook account, on an average you notice more than 10 new news feed items, 5 notifications, 10 messages, and two friend requests. Surprised, right? Then this is what you need to know that what Facebook does with this data.

Growing Big With Big Data

There is no doubt that Facebook is one of the largest Big Data specialists, dealing with petabytes of data, including historical and real-time, and will keep growing in the same horizon. While the world is coming closer together on this platform, Facebook develops algorithms to track those connections and their presence on or outside its walls to fetch the most suitable posts for its users.  Whether it is your wall post, your favorite books, movies, or your workplace, Facebook analyzes each and every bit of your data and offers you better services each time you log in.

Star Performers Working Behind Facebook’s Big Data

There is a combined workforce of people and technology constantly working behind the successful implementation of this platform. Though the platform is continuously being enriched, below are the prime technological aspects:


“Facebook runs the world’s largest Hadoop cluster" says Jay Parikh, Vice President Infrastructure Engineering, Facebook.

Basically, Facebook runs the biggest Hadoop cluster that goes beyond 4,000 machines and storing more than hundreds of millions of gigabytes. This extensive cluster provides some key abilities to developers:

  • The developers can freely write map-reduce programs in any language.
  • SQL has been integrated to process extensive data sets, as most of the data in Hadoop’s file system are in table format. Hence, it becomes easily accessible to the developers with small subsets of SQL.

Hadoop provides a common infrastructure for Facebook with efficiency and reliability. Beginning with searching, log processing, recommendation system, and data warehousing, to video and image analysis, Hadoop is empowering this social networking platform in each and every way possible. Facebook developed its first user-facing application, Facebook Messenger, based on Hadoop database, i.e., Apache HBase, which has a layered architecture that supports plethora of messages in a single day.


With a huge amount of unstructured data coming across each day, Facebook slowly realized that it needs a platform to speed up the entire analysis part. That’s when it developed Scuba, which could help the Hadoop developers dive into the massive data sets and carry on ad-hoc analyses in real-time.

Facebook was not initially prepared to run across multiple data centers and a single break-down could cause the entire platform to crash. Scuba, another Big data platform, allows the developers to store bulk in-memory data, which speeds up the informational analysis. It implements small software agents that collect the data from multiple data centers and compresses it into the log data format. Now this compressed log data gets compressed by Scuba into the memory systems which are instantly accessible.

According to Jay Parikh, “Scuba gives us this very dynamic view into how our infrastructure is doing — how our servers are doing, how our network is doing, how the different software systems are interacting.”


“The amount of data to be stored, the rate of growth of the data, and the requirement to serve it within strict SLAs made it very apparent that a new storage solution was absolutely essential.”
- Avinash Lakshman, Search Team, Facebook

The traditional data storage started lagging behind when Facebook's search team discovered an Inbox Search problem. The developers were facing issues in storing the reverse indices of messages sent and received by the users. The challenge was to develop a new storage solution that could solve the Inbox Search Problem and similar problems in the future. That is when Prashant Malik and Avinash Lakshman started developing Cassandra.

The objective was to develop a distributed storage system dedicated to managing a large amount of structured data across multiple commodity servers without failing once.


After Yahoo implemented Hadoop for its search engine, Facebook thought about empowering the data scientists so that they could store a larger amount of data in the Oracle data warehouse. Hence, Hive came into existence. This tool improved the query capability of Hadoop by using a subset of SQL and soon gained popularity in the unstructured world. Today almost thousands of jobs are run using this system to process a range of applications quickly.


Hadoop wasn’t designed to run across multiple facilities. Typically, because it requires such heavy communication between servers, clusters are limited to a single data center.

Initially when Facebook implemented Hadoop, it was not designed to run across multiple data centers.  And that’s when the requirement to develop Prism was felt by the team of Facebook. Prism is a platform which brings out many namespaces instead of the single one governed by the Hadoop. This in turn helps to develop many logical clusters.

This system is now expandable to as many servers as possible without worrying about increasing the number of data centers.


Developed by an ex-Yahoo man Avery Ching and his team, Corona allows multiple jobs to be processed at a time on a single Hadoop cluster without crashing the system. This concept of Corona sprouted in the minds of developers, when they started facing issues with Hadoop’s framework. It was getting tougher to manage the cluster resources and task trackers. MapReduce was designed on the basis of a pull-based scheduling model, which was causing a delay in processing the small jobs. Hadoop was limited by its slot-based resource management model, which was wasting the slots each time the cluster size could not fit the configuration.

Developing and implementing Corona helped in forming a new scheduling framework that could separate the cluster resource management from job coordination.


Another technological tool that is developed by Murthy was Peregrine, which is dedicated to addressing the issues of querying data as quickly as possible. Since Hadoop was developed as a batch system that used to take time in running different jobs, Peregrine brought the entire process close to real-time.

Apart from the above prime implementations, Facebook uses many other small and big sized pieces of technology to support its Big Data infrastructure, such as Memcached, Hiphop for PHP, Haystack, Bigpipe, Scribe, Thrift, Varnish, etc.

Today Facebook is one of the biggest corporations on earth thanks to its extensive data on over one and a half billion people on earth. This has given it enough clout to negotiate with over 3 million advertisers on its platform in order to clock staggering revenues that is north of 17 Billion US Dollars. But the privacy and security concerns still loom large regarding whether Facebook will utilize all that gargantuan volumes of data to server humanity’s greater good or just use it to make more money.

But one thing is for sure, it is Big Data indeed that has propelled Facebook, a small-time Harvard dorm startup into the constellation of some of the biggest corporations on earth of all times!

By Rob Petersen, {grow} Community Member

Big Data is the collection of large amounts of data from places like web-browsing data trails, social network communications, sensor and surveillance data that is stored in computer clouds then searched for patterns, new revelations and insights.

In less than a decade, Big Data is a multi-billion-dollar industry.

Who’s using it? How are they apply data? What are they achieving?

Here are 37 Big Data case studies where companies see big results.

  1. AETNA: Looks at patient results on a series of metabolic syndrome-detecting tests, assesses patient risk factors and focuses on treating one or two things that will have the most impact (statistically speaking) on improving their health. 90% of patients who didn’t have a previous visit with their doctor would benefit from a screening, and 60% would benefit from improving their adherence to their medicine regimen.
  2. AMERICAN EXPRESS: Starts looking for indicators that could predict loyalty and developed sophisticated predictive models to analyze historical transactions and 115 variables to forecast potential churn. The company believes it can now identify 24% of accounts that will close within the next four months.
  3. ATLANTA FALCONS: Use GPS technology to assess player movements during practices, which helps the coaches create more efficient plays.
  4. BANK OF AMERICA: “BankAmeriDeals” provides cash-back offers to credit and debit-card customers based upon analyses of their prior purchases.
  5. BASIS: Is a wrist-based health tracker and online personal dashboard that helps users incorporate small, progressive health changes over time—that ultimately add up to major results.
  6. BRITISH AIRWAYS: “Know Me” program combines already existing loyalty information with the data collected from customers based on their online behavior. With the blending of these two sources of information, British Airways can make more targeted offers while responding to service lapses in ways to create a more positive experience for the flyer.
  7. CAESARS ENTERTAINMENT: Combines patrons’ gambling outcomes with their rewards program information to offer enticing perks to those who are losing at the tables.
  8. CATAPULT: Uncovers vitally important information like whether an athlete is developing an injury, or whether certain workouts are overly stressful. That helps teams keep their players safe and game-ready. Sales grew 64% last year and Catapult now works with nearly half of NFL teams, a third of NBA teams, and 30 major college programs.
  9. COMMONBOND: Is a student lending platform that connects students and graduates to alumni investors and accomplished professionals. Thus, students can access lower, fixed-rate financing—and save thousands of dollars on their repayments.
  10. DELTA: With over 130 million bags checked per year, Delta has a lot of tracking data about bags and became the first major airline to allow customers to track their bags from mobile devices. To date, the app has been downloaded over 11 million times and gives customers much greater peace of mind.
  11. DUETTO: Makes it easier for companies to personalize data to individuals searching online for hotels. Prices by hotels can be personalized by taking data such as how much you typically spend at the bar or casino to incentivize you with a lower price for your room. The hotel can give you a better price, knowing you’ll spend money on other services.
  12. EBAY: “the Feed” is a new homepage that allows customers to follow entire categories of items no matter how obscure. This makes it easier for customers to stay on top of the latest items they have a particular interest, especially if they are collectors.
  13. EVOVL: Helps large global companies make better hiring and management decisions through tpredictive analytics.  Evolv crunches more than 500 million data points on gas prices, unemployment rates, and social media usage to help clients like Xerox—who has cut attrition by 20 percent—predict, for example, when an employee is most likely to leave his job. Companies like Xerox, AT&T and Kelly Services use Evolv, and on average, our clients see a $10 million impact on their P&L. Evolv’s sales grew a whopping 150% from Q3 2012 to Q3 2013.
  14. GENERAL ELECTRIC: Many machines—everything from power plants to locomotives to hospital equipment—now pump out data about how they’re operating. GE’s analytics team crunches it, then rejiggers machines to be more efficient. Even tiny improvements are substantial, given the scale: By GE’s estimates, data can boost productivity in the U.S. by 1.5%, which over a 20-year period could save enough cash to raise average national incomes by as much as 30%.
  15. GOOGLE: Working with the U.S. Centers for Disease Control, tracks when users are inputting search terms related to flu topics, to help predict which regions may experience outbreaks.
  16. HOMER: Handcrafted by top literacy experts, helps children learn to read. It has a complete phonics program, a library of beautifully illustrated stories, hundreds of science field trips, and exciting art and recording tools—combining the best early learning techniques into an engaging app that connects learning to read with learning to understand the world.
  17. IRS: Uses Big Data to stop identity theft, fraud, and improper payments, such as those who are not paying taxes and should. The system also helps to ensure compliance with tax rules and laws. So far, the IRS has stopped billions of dollars in fraud, specifically with identity theft, and recovered more than $2 billion over the last three years.
  18. KAISER: Uses Big Data to study the incidence of blood clots within a group of women taking oral contraceptives. The analysis revealed that one formula contained a drug that increased the threat of blood clots by 77%—understanding these types of patterns can help many people avoid visits to the doctor or emergency room.
  19. KROGER: Accesses, collects, and manages data for about 770 million consumers. Claiming 95% of sales are rung up on the loyalty card, Kroger sees an impact from its award-winning loyalty program through nearly 60% redemption rates and over $12 billion in incremental revenue by using big data and analytics.
  20. LENDUP: A banking startup, evaluates whether to approve loan applicants per how a user interacts with its site.
  21. NETFLIX: Having drawn in millions of users with its high-quality original programming, is now using its trove of data and analytics about international viewing habits to create and buy programming that it knows will be embraced by large, ready-made audiences.
  22. NEXT BIG SOUND: Explains through analytics of online activity Wikipedia page views, Facebook Likes, You Tube Views and Twittter Mentions which bands are about to break, which late night shows impact an artist’s trajectory, and many, many other quandaries that for decades had been the exclusive domain of mercurial executives
  23. NORFOLK SOUTHERN: Deploys customized software to monitor rail traffic and reduce congestion, enabling trains to operate at higher speeds. The company forecasts $200 million in savings by making trains run just 1 mph faster.
  24. PALANTIR TECHNOLOGIES: Uses big data to solve security problems ranging from fraud to terrorism. Their systems were developed with funding from the CIA and are widely used by the US Government and their security agencies.
  25. PROCTER & GAMBLE: Examines its business program success and react more quickly to changing market conditions, P&G needed to clearly and easily understand its rapidly growing and vast amount of data. integrated vast amounts of structured and unstructured data across research and development, supply chain, customer-facing operations, and customer interactions, both from traditional data sources and new sources of online data. Now, P&G can load and integrate data faster and execute reliable analysis at scales that were previously not possible.
  26. QSTREAM: Allows sales reps to engage in fun, scenario-based challenges—complete with leaderboards and scoring—and produces sophisticated, real-time analytics. With this information, companies gain important insights into their existing knowledge gaps and are given the tools to create dynamic sales forces.
  27. RED ROOF INN: Produces 10% growth year over year helping people who are stranded due to bad weather. The marketing department uses historical weather information, and begin a plan to target stranded airport passengers. With some estimated 2-3% of flights cancelled daily, 500 planes don’t take off, 90,000 passengers get stranded. The company uses big data to identify the areas of demand and uses search advertising, a focus on mobile communications, and other methods to drive digital bookings with personalized messages like ‘Stranded at O’Hare? Check out Red Roof Inn.’
  28. RENTHOP: Is an apartment search platform simplifies real estate decisions, allowing users to look at curated apartment listings from trustworthy sources and determine which apartments are worth investigating—then schedule appointments with reputable brokers and property managers.
  29. SEARS: Has consolidated data relating to customers, products, sales and campaigns to reduce the time needed to launch major marketing campaigns from eight weeks to one.
  30. SPRINT: Uses Big Data analytics to improve quality and customer experience while reducing network error rates and customer churn. They handle 10’s of billions of transactions per day for 53 million users, and their Big Data analytics put real-time intelligence into the network, driving a 90% increase in capacity.
  31. THE WEATHER CHANNEL: Is more than just a weather channel. By analyzing the behavior patterns of its digital and mobile users in 3 million locations worldwide—along with the unique climate data in each locale—the Weather Company has become an advertising powerhouse, letting shampoo brands, for example, target users in a humid climate with a new anti-frizz product. More than half of the Weather Company’s ad revenue is now generated from its digital operations.
  32. T-MOBILE: Has integrated Big Data across multiple IT systems to combine customer transaction and interactions data to better predict customer defections. By leveraging social media data (Big Data) along with transaction data from CRM and Billing systems, T-Mobile USA has could “cut customer defections in half in a single quarter”.
  33. UBER: Is cutting the number of cars on the roads of London by a third through UberPool that cater to users who are interested in lowering their carbon footprint and fuel costs. Uber’s business is built on Big Data, with user data on both drivers and passengers fed into algorithms to find suitable and cost-effective matches, and set fare rates.
  34. UPS: On a daily basis, UPS makes 16.9 package and document deliveries every day and over 4 billion items shipped per year through almost 100,000 vehicles. With this volume, there are numerous ways UPS uses Big Data, and one of the applications is for fleet optimization. On-truck telematics and advanced algorithms help with routes, engine idle time, and predictive maintenance. Since starting the program, the company has saved over 39 million gallons of fuel and avoided driving 364 million miles.
  35. US XPRESS: A provider of a wide variety of transportation solutions collects about a thousand data elements ranging from fuel usage to tire condition to truck engine operations to GPS information, and uses this data for optimal fleet management and to drive productivity saving millions of dollars in operating costs.
  36. VIROOL: is a powerful video service, allowing clients to target desired audiences on its global network of more than 100 million viewers. With affordable campaigns starting as low as $10 per day, Virool gives anyone the ability to distribute YouTube video content through a series of online publishers and offers clients full transparency with accurate and detailed analytics.
  37. WAL-MART: Relies on text analysis, machine learning and even synonym mining to produce relevant search results. Wal-Mart says adding semantic search has improved online shoppers completing a purchase by 10% to 15%. In Wal-Mart terms, that is billions of dollars.

Do these case studies show how Big Data can work? And results that can be achieved? Does your company need help putting Big Data to use?

Rob Petersen is an experienced advertising and marketing executive and the founder of the BarnRaisers agency. Follow Rob on Twitter: @RobPetersen

Illustration courtesy Flickr CC and Kevin Dooley

Related Posts

Tags: big data, customer acquisition, evergreen, robpetersen
Posted in Artificial Intelligence, Big Data and Analytics, business strategy, innovation, Marketing Strategy, Mobile marketing, Technology | 9 Comments »

All posts

One thought on “Big Data Case Study Facebook

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *