By: Dr. Rick Brattin
So just how ‘big’ is the world’s ‘big data’ and why should companies care? As discussed in my last column, “What makes today’s data ‘Big’?”, we have experienced an explosion of new data and data
sources over the last two decades. This is data that companies are analyzing in pursuit of business insight and competitive advantage. Those who do this well are learning that the sheer size and complexity of today’s data can be a struggle to manage using traditional hardware and software.
We use the term ‘Big Data’ to indicate that today’s data is much larger and more complex than ever before. Doug Laney, of the IT research firm Gartner, Inc., is credited with creating what is perhaps the most widely used framework for describing how ‘big’ our data has become. The framework uses three dimensions: Volume (a measure of scale), velocity (a measure of speed), and variety (a measure of form). In this column, I explore the volume dimension and the challenges companies face as a result.
International Data Corporation (IDC), a leading provider of market intelligence for information technology markets, predicts the world’s data will grow to a staggering 44 zettabytes by the year 2020. Huh, 44 what? Herein lies one of the difficulties concerning big data. Data has become so large that the words we use to describe its size are not part of our everyday vocabulary. This leads to confusion.
Mathematically, a zettabyte is equal to 1 trillion gigabytes, which is equal to 1,000,000,000,000,000,000,000 bytes. While this may help those who are fluent in technology-speak, it still falls far short for most people. I like to explain size by comparison to common things we use every day. Here are a few of my favorites collected over the years:
- If you think of a 60-watt light bulb as being a gigabyte, it would take the Hoover Dam 690 years to produce the amount of energy required to power a 44 zettabytes of light bulbs for just one hour
- If a byte equals one character of text, it would take 14,212 trillion copies of “War and Peace” by Leo Tolstoy to reach 44 zettabytes of characters
- If the 11-ounce coffee on your desk equals one gigabyte, one zettabyte world have the same volume as the Great Wall of China, 44 zettabytes = 44 Great Walls of China
- You would need to watch the entire Netflix catalog just short of 140 million times to view 44 zettabytes of data.
Of course, accuracy varies with broad statements like these, but they do help communicate the immensity of data in the world. Not all of this data is valuable to companies for analysis. According to IDC, 22 percent of the data available in 2013 was a candidate for analysis. By 2020, the useful percentage of data could grow to 35 percent.
Three important big data challenges face companies as a result of data volume: 1) we have reached a point where the size of available data is beyond the ability of typical database software to capture, store and manage. The technology industry is responding with a variety of new and sophisticated hardware and software which can efficiently deal with high data volumes, but these often defy existing paradigms and time-honored traditions. 2) complex search, query and analytical methods are often required to analyze today’s big data. From traditional statistics that we love to hate, to machine learning, to algorithmic automation, the variety of analytic choices available can be overwhelming. 3) in addition to business skills, an organization’s decision makers often do not possess the computer and data management skills required to produce insights from the great volume of data available today. They must rely on data scientists to navigate this sea of data and effectively summarize valuable findings.
There is an ever-expanding data universe available to us. The challenges surrounding it are daunting, but companies that learn to harness it for analytic insight will have great advantage over their competition.
Dr. Rick Brattin is an assistant professor of Computer Information Systems at Missouri State University. He has 25 years of experience in data analytics, business intelligence and information governance with Fortune 100 companies. Email:email@example.com.
This article appeared in the May 28, 2016 edition of the News-Leader and can be accessed online here.