Matthew Cook -
Software Money Pit Blog
    Strategy & Management
    Trends & Technologies
    Case Studies
About Matt
Buy Matt's Book
Matthew Cook -
  • Software Money Pit Blog
    • Strategy & Management
    • Trends & Technologies
    • Case Studies
  • About Matt
  • Buy Matt’s Book
Trends & Technologies

Big Data: Correlations, Not Cause-and-Effect

February 18, 2016 by Matt Cook No Comments

Image by Marcos Gasparutti, CC license

In their recently published book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,” Viktor Mayer-Schonberger and Kenneth Cukier say that big data will provide a lot of information that can be used to establish correlations, not necessarily precise cause and effect.

But that might be good enough to extract the value you need from big data.

Three examples from their book:

  1. Walmart discovered a sales spike in Pop-Tarts if storms were in the forecast. The correlation was also true of flashlights, but selling more flashlights made sense; selling more Pop-Tarts didn’t.
  2. Doctors in Canada now prevent fevers in premature infants because of a link between a period when the baby’s vital signs are unusually stable, and, 24 hours later, a severe fever.
  3. Credit scores can be used to predict which people need to be reminded to take a prescription medicine.

Why did the people involved in the above examples compare such different sets of data? One possible reason: because they could – relatively quickly and at low cost – this was made possible by superfast data processing and cheap memory. If you could mash together all kinds of data in large volumes – and do so relatively cheaply – why wouldn’t you until you found some correlations that looked interesting?

You can begin experimenting – a process I endorse — with Big Data. You need three basic components:

  1. A way to get the data, whether out of your transaction systems or from external sources, and into a database.
  2. Superfast data processing (a database with enormous amounts of RAM and massively parallel processing). This can be had on a software-as-service basis from Amazon and other vendors.
  3. Analytics tools that present the data in the visual form you want. Vendors include Oracle, Teradata, Tableau, Information Builders, Qlikview, Hyperion, and many others.

Correlations are usually easier to spot visually. And visualization is where the market seems to be going, at least in terms of hype and vendor offerings. New insights are always welcome, so we shall see what sells and what doesn’t.

The assessment from Gartner seems about right to me at this point in time: that big data is both 1) currently in the phase they call the “trough of disillusionment;” and 2) promising enough that its use in BI will grow sharply.

Share:
Trends & Technologies

Big Data 101

May 10, 2015 by Matt Cook No Comments

Image: “Data Center.” by Stan Wlechers, CC license

So what is Big Data, particularly Big Data analytics? Why all the hype?

Big Data is what it implies: tons of data. We’re talking millions or billions of rows here – way too much for standard query tools accessing data on a disk.

What would constitute “tons” of data? Every bottle of “spring,” “purified” or “mineral” water that was scanned at a grocery store checkout during the month of July 2011; the brand, the price, the size, the name and location of the store, and the day of the week it was bought. That’s six pieces of data, multiplied by the estimated 3.3 billion bottles of water sold monthly in the United States.

Big Data analytics is the process of extracting meaning from all that data.

The analysis of big data is made possible by two developments:

1) The continuation of Moore’s law; that is, computer speed and memory have multiplied exponentially. This has enabled the processing of huge amounts of data without retrieving that data from disk storage; and

2) “Distributed” computing structures such as Hadoop have made it possible for the processing of large amounts of data to be done on multiple servers at once.

The hype you read about Big Data may be justified. Big data does have potential and should not be ignored. With the right software, a virtual picture of the data can be painted with more detail than ever before. Think of it as a photograph, illustration or sketch – with every additional line of clarification or sharpening of detail, the picture comes more into focus.

Michael Malone, writing in The Wall Street Journal, says that some really big things might be possible with big data:

“It could mean capturing every step in the path of every shopper in a store over the course of a year, or monitoring every vital sign of a patient every second for the course of his illness….Big data offers measuring precision in science, business, medicine and almost every other sector never before possible.”

But should your enterprise pursue Big Data analytics? It may already have. If your company processes millions of transactions or has millions of customers, you have a lot of data to begin with.

You need three things to enable Big Data analytics:

  1. A way to get the data, whether out of your transaction systems or from external sources, and into a database. Typically this is done with ETL or Extract, Transform, and Load software tools such as Informatica. Jobs are set up and the data is pulled every hour, day, etc., put into a file and either pushed or pulled into a storage environment.
  2. Superfast data processing. Today, an in-memory database (a database with enormous amounts of RAM and massively parallel processing) can be acquired and used on a software-as-service basis from Amazon Web Services at a very reasonable cost.
  3. User interface analytics tools that present the data in the visual form you prefer. Vendors include Oracle, Teradata, Tableau, Information Builders, Qlikview, Hyperion, and many others. The market here is moving toward data visualization via low-cost, software-as-a-service tools that allow you to aggregate disparate sources of data (internal and external systems, social media, and public sources like weather and demographic statistics.
Share:

Categories

  • Strategy & Management
  • Trends & Technologies

Archives

© 2017 Copyright Matthew David Cook // All rights reserved