Image: visualization of data showing places in New York City frequented by tourists (red) and locals (blue); by Eric Fisher; Creative Commons license.

Another emerging segment of the analytics software market is data virtualization (DV), referred to by some as Information-as-a-Service (IaaS), which enables access to multiple data sources, usually in real time, without the time and expense of traditional data warehousing and data extraction methods.

Forrester Research defines DV as solutions that “provide a virtualized data services layer that integrates data from heterogeneous data sources and content in real-time, near-real-time or batch as needed to support a wide range of applications and processes.”  Data Visualization, on the other hand, refers to methods of displaying data in a highly visual way, with the purpose of finding a display mechanism that reveals more insight than traditional reporting methods (see ‘What is Data Visualization’?)

Traditional BI or analytics methods rely on some form of data warehousing, in which pieces of data are extracted, usually from transaction systems, transformed or “normalized” (i.e., “formatted”), and stored in tables according to some type of schema. “Customer Account Number,” for example, may belong in the “Customer” table, and so on. As covered in the book, building a data warehouse and getting it to work right can take years, and require substantial technical skills that even many mid-sized to large companies just don’t have.

Data Virtualization aims to overcome this disadvantage by not extracting data from their original sources but by viewing and manipulating the data inside the DV tool or layer to build your analysis.  In simple terms, a DV tool is supposed to let you “see” sources of data in different applications and databases, and to “select” data from those sources for your queries or analysis.

While it’s feasible to connect directly to external applications and other data sources, whoever owns or manages that application or data source may prevent you from connecting directly, for security reasons, or to avoid overloading the database, to avoid corrupting the data, or simply because the data is proprietary and the provider allows access only through an environment external to the data source.  These are some of the barriers I have encountered.

Forrester estimates an $8 billion market for DV software.  Forrester notes that the current market is dominated by big companies such as SAP, Oracle, Informatica, Microsoft and Red Hat, and specialized firms like Composite Software, Denodo Technologies and Radiant Market.

Experimenting on a small scale is a good idea here.  Vendors are willing to show you capabilities and do small pilots to prove the concept you might be considering the software for.