Data Virtualization Is Absolutely Critical. Here’s Why…

Data Virtualization is a process that gathers and integrates data from multiple sources, locations, and formats to create a single stream of data without any overlap or redundancy.

Data Virtualization innovation is helpful, in our world of non-stop data transmission and high-speed information sharing, as a tool  to aid in collecting, combining, and curating massive amounts of data.

With big data analytics, companies can locate revenue streams from existing data in storage, or they can find ways to reduce costs through efficiency. However, this is easier said than done. IT companies generally have multiple, dissimilar sources of information, so accessing that data can be time consuming and difficult. Data virtualization systems can help.

Companies that have implemented data virtualization software have better, quicker integration speeds and can improve and quicken their decision-making.

What is Data Virtualization

Data virtualization (DV) creates one “virtual” layer of data that distributes unified data services across multiple users and applications. This gives users quicker access to all data, cuts down on replication, reduces costs, and provides data flexible to change.

Though it performs like traditional data integration, DV uses modern technology to bring real-time data integration together for less money and more flexibility. DV has the ability to replace current forms of data integration and lessens the need for replicated data marts and data warehouses.

Data virtualization can seamlessly function between derived data resources and original data resources, whether from an onsite server farm or a cloud-based storage facility. This allows businesses to bring their data together quickly and cleanly.

How Virtualization Works

Most people who use IT are familiar with the concept of data virtualization. Let’s say you store photos on Facebook.  When you upload a picture from your personal computer, you provide the upload tool with the photo’s file path.

After you upload to Facebook, however, you can get the photo back without knowing its new file path. Facebook has an abstraction layer of DV that secures technical information. This layer is what is meant by data virtualization.

When a company wants to build Virtual Data Services, there are three steps to follow:

  • Connect & Virtualize Any Source: Quickly access disparate structured and unstructured data sources using connectors. Bring the metadata on board and create as normal source views in the DV layer.
  • Combine & Integrate into Business Data Views: Integrate and transform source views into typical business views of data. This can be achieved in a GUI or scripted environment.
  • Publish & Secure Data Services: Turn any virtual data views into SQL views or a dozen other data formats.

Once a DV environment is in place, users will be able to accomplish tasks using integrated information. A DV environment allows for the search and discovery of information from varied streams.

  • Global Metadata: Global information search capability lets users access data through any format from anywhere in the world.
  • Hybrid Query Optimization: Allows for the optimization of queries, even with “on-demand pull and scheduled batch push data requests.”
  • Integrated Business Information: Data virtualization brings users integrated information while hiding the complexity of accessing varied data streams.
  • Data Governance: DV layer serves as a unified layer to present business metadata to users. Simultaneously, it helps to understand the underlying data layers through data profiling, data lineage, change impact analysis and other tools and expose needs for data normalization / quality in underlying sources.
  • Security and Service Level Policy: All integrated DV data views can be secured and authenticated to users, roles and groups. Additional security and access policies can  manage service levels to avoid system overuse.

Data Virtualization Tools

The various capabilities that Data Virtualization delivers offers companies a newer, faster method of obtaining and integrating information from multiple sources. The top tools currently in use are as follows:

  • Logical abstraction and decoupling
  • Enhanced data federation
  • Semantic integration of structured & unstructured data
  • Agile data services provisioning
  • Unified data governance & security

These capabilities cannot be found organized in any other integration middleware. While IT specialists can custom code them, that minimizes the agility and speed advantages DV offers.

Data Virtualization creates many benefits for the companies using it:

  • Quickly combine multiple data sources as query-able services
  • Improve productivity in IT and by business data users (50%-90%)
  • Accelerate time-to-value
  • Improve quality and eliminate latency of data
  • Remove the costs associated with populating and maintaining a Data Warehouse
  • Significantly reduce the need for multiple copies of any data
  • Less hardware infrastructure

While this innovate new path to data collection and storage offers increased speed and agility, it is important to note what DV is not meant to be.

What Data Virtualization is Not

In the business world, particularly in IT, there are buzzwords flying about in marketing strategies and among industry analysts. It is therefore important to make note of what Data Virtualization is not:

  • Data visualization: Though it seems similar, visualization is the physical display of data to users graphically. Data virtualization is middleware that streamlines the search and collection of data.
  • A replicated data store: Data virtualization does not copy information to itself. It only stores metadata for virtual views and integration logic.
  • A Logical Data Warehouse: Logical DWH is an architecture, not a platform. Data Virtualization is technology used in “creating a logical DWH by combining multiple data sources, data warehouses and big data stores.”
  • Data federation: Data virtualization is a superset of capabilities that includes advanced data federation.
  • Virtualized data storage: VDS is database and storage hardware; it does not offer real-time data integration or services across multiple platforms.
  • Virtualization: When used alone, the term “virtualization” refers to hardware virtualization — servers, networks, storage disks, etc.

Myths and Inaccuracies

As with every new innovation in technology, there will always be myths and inaccuracies surrounding implementation.

We don’t need to virtualize our data – we already have a data warehouse.
The sources of unstructured data increase every day. You can still use your data warehouse, but virtualization allows you to tie in these new sources of data to produce better information and a competitive advantage for your business.

Implementing new data technology isn’t cost effective.
Data virtualization software costs are comparable to building a custom data center. DV also does not require as many IT specialists to use and maintain the system.

Querying virtual data can’t perform like physical data queries.
With the constant innovation and improvement of computing platforms, faster network connections, processor improvements, and new memory storage, virtualization software can process queries with multiple unconnected data sources at near real-time speeds.

Data virtualization is too complex.
When something is new in technology, humans have the tendency to question it based on their own lack of experience. Most virtualized software is easy enough to be used by geeks and laymen alike.

The purpose of data virtualization is to emulate a virtual data warehouse.
While DV can work this way, it is more valuable when data marts are connected to data warehouses to supplement them. “The flexibility of data virtualization allows you to customize a data structure that fits your business without completely disrupting your current data solution.”

Data virtualization and data federation are the same thing.
Data federation is just one piece of the full data virtualization picture. Data federation can standardize data stored on different servers, in various access languages, or with dissimilar APIs. This standardizing capability allows for the successful mining of data from multiple sources and the maximizing of data integration.

Data virtualization only provides limited data cleansing because of real-time conversion.
This is a claim that can be made about any number of data query software programs. It is best to clean up system data natively rather than burden query software with transformation of data.

Data virtualization requires shared storage.
Data virtualization is quite versatile. It allows you to build customized storage devices for your system needs.

Data virtualization can’t perform as fast as ETL.
Through data reduction, data virtualization performs more quickly than ETL. “Operations perform at higher speeds because the raw data is presented in a more concise method due to compression, algorithmic selection and redundancy elimination.”

Data virtualization can’t provide real-time data.
DV sources are updated live instead of providing snapshot data, which is often out of date. “It is closer to providing real-time data and faster than other data types that have to maintain persistent connections.”

Why Do We Need Virtualization?

Data is transferred among users in different speeds, formats, and methods. These variables make Data Virtualization a must have in the global business world. DV will help companies search, collect, and integrate information from various users, platforms, and storage hubs much more quickly. This will save the company time and money.

Data Virtualization is perfect when data demands change on the fly and when access to real-time data is critical to positive business outcomes. DV also provides you with access to any data storage system you are currently using. Despite the differences in storage platforms and systems, DV will allow you to integrate all the material in a single model.

Data Virtualization offers help in security challenges because the data is not transferred – it is left at the source as DV provides virtual access from anywhere. This is also cost-effective as you will not be duplicating any data.


As we move further into the technical age of global systems, the need for Data Virtualization becomes clear. Access to information across platforms, languages, and storage types will precipitate a faster and more useful transfer of data that everyone can use.

The future is here. The future is now.


When Big Data Meets Tablets


BI Mobile Big Data Image

Big Data and mobility are converging offering mobile workers access to real time data via mobile dashboard apps. An interview  published in Tech Republic with the Forerunner Group’s managing director Dwight deVera. Big Data and mobility are converging offering mobile workers access to real time data via mobile dashboard apps. Big Data and mobile devices can seem like natural enemies in the wild at first glance. However, today’s current generation tablets especially the iPad with its retina display and ever growing processor and memory specifications make it possible for mobile workers to use WiFi or 4G connected tablets as dashboard front ends to tap into Big Data residing in the cloud or on back-end servers.

Read Full Post: When Big Data Meets Tablets

Source: Forerunner Group

Image Courtesy of: franky242/

Advice for CFOs: Invest in New Technology

Top Technology Trends for Today’s CFO’s” is another insightful post from a blogger we frequently feature, Timo Elliott. In it he admits that the CFO relationship with the CEO and other business executives leaves something to be desired.  He recommends that CFOs invest in the latest technology, which will increase productivity with real-time updates and continuous forecasting.


{Image from Timo’s post, link to}

Elliott mentions a combination of new technology including: in-memory computing, big data, the cloud, and mobile.

He homes in on a key point—that finance staff at large companies are extremely bogged down with just the basics of maintaining their financial reports. As Elliott puts it, “Staff have to spend too much time on basic duties and have no time to improve their understanding of the operational measures that drive and impact financial measures.” This lack of insight or understanding of how the operational measures drive and impact financial measures is the root of the relationship problem between CFOs and other business executives.

Elliott suggests new in-memory computing technology because, “they reduce complexity by combining real-time actuals with budgeting and analysis in a single, integrated system. Financial data is stored just once, making almost every aspect of financial operations faster, simpler, cheaper, and more effective.” We couldn’t agree more, as developers of a new in-memory technology ourselves.

The result of improved systems, improved speed, and better data is ultimately a better working relationship between business executives, and a more productive, effective workplace.

Read Timo Elliott’s post here



Which OLAP is Best?


OLAP technology has continued to develop, a good indicator of its broad applicability in the software solution market. And though newer doesn’t always mean improved, our opinion is that the most recent OLAP technologies are faster and (generally) better than their predecessors. These recent OLAP advances include aspects of in-memory OLAP combined with hybrid systems that couple the benefits of multidimensional modeling with the steadfastness of a relational database.

Which OLAP is best for your business?

When a company commits to purchasing an OLAP-based BI system, it’s essential that the system meets present and potential future needs. With the wide variety of OLAP technologies available, it has become critical to know the differences between the main types, MOLAP, ROLAP, HOLAP–and a new entrant, HTAP. While there are other versions of OLAP, with this post we have tried to help make the decision-making process a bit easier, by providing descriptions of each type, along with their advantages and drawbacks.

MOLAP: Multi-dimensional OLAP
Data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats (one example is PowerOLAP’s .olp file). MOLAP products can be compatible with Excel, which can make interacting with the data very easy to learn.


  • Excellent performance: MOLAP cubes are built for fast data retrieval, and are optimal for slicing and dicing operations.
  • Can perform complex calculations quickly: often calculation logic can be handled by users (meaning, no relational database programming skills needed), and the main reason for MOLAP is precisely to speed up calculations in a multidimensional environment optimized for fast data calculation.


  • Sometimes limited in the amount of data it can handle: because all calculations are performed when the cube is built, it might not be possible to include a large amount of data in the cube itself. This is not to say that the data in the cube cannot be derived from a large amount of data. Indeed, this is possible, but only summary-level information will be included in the cube itself.
  • MOLAP products are typically proprietary systems.
  • Relevant data must be transferred from relational tables, which can be cumbersome and, by definition, redundant.

ROLAP: Relational OLAP
ROLAP products access a relational database by using SQL (structured query language), which is the standard language that is used to define and manipulate data in an RDBMS. Subsequent processing may occur in the RDBMS or within a mid-tier server, which accepts requests from clients, translates them into SQL statements, and passes them on to the RDBMS.


  • No data limitation, can handle large amounts of data
  • Can access use functionality of inherited relational databases


  • Performance can be slow because of large size of data sets
  • Can be limited to SQL functions, which can be inflexible
  • Data may need to be reformatted for end-users

The merger of the best features of MOLAP and ROLAP allowing for fast calculations from RDBMS by using pre-calculated cubes. (New HTAP systems–see further below–may be considered HOLAP products, though they function differently from previous products that people may recognize as HOLAP.)


  • Has the best features of both MOLAP and ROLAP: scalability, flexibility, and speed
  • Uses RDBMS SQL functionality
  • Can “drill-down” from a cube to a relational table
  • Fast to use because of pre-calculated cubes


  • Has the limitations of both MOLAP and  ROLAP: as it is fast, it may not be as fast as pure MOLAP, and as it is scalable, it may not be as scalable as pure ROLAP.

What’s New in the Market?

Hybrid Transaction / Analytical Processing (HTAP)

Gartner coined the term HTAP in a paper in early 2014 to describe new in-memory data systems that do both online transaction processing (OLTP)  and online analytical processing (OLAP). HTAP represents a new way to tie data together in a way that hasn’t been possible before. Combining analytical engine capabilities with relational data tables is a the root of HTAP, and we at think it is the way data will be managed in the future.


  • The technology is sited in the relational database
  • Powerful, often distributed, processing–which means it is fast
  • No more data replication
  • New transactional information becomes part of an analytical model as fast as technologically possible
  • Unites the relational data tables with the models that are used for decision making by business leaders


  • Change in existing architectures can be disruptive
  • New technologies and accompanying skills may have to be learned

For an example of an HTAP product, check out Olation® from PARIS Tech, the sponsor of Olation can be categorized as an HTAP product — even the name Olation implies the combination of “OLAP” and “relational” technologies.