Data Virtualization is a process that gathers and integrates data from multiple sources, locations, and formats to create a single stream of data without any overlap or redundancy.
Data Virtualization innovation is helpful, in our world of non-stop data transmission and high-speed information sharing, as a tool to aid in collecting, combining, and curating massive amounts of data.
With big data analytics, companies can locate revenue streams from existing data in storage, or they can find ways to reduce costs through efficiency. However, this is easier said than done. IT companies generally have multiple, dissimilar sources of information, so accessing that data can be time consuming and difficult. Data virtualization systems can help.
Companies that have implemented data virtualization software have better, quicker integration speeds and can improve and quicken their decision-making.
What is Data Virtualization
Data virtualization (DV) creates one “virtual” layer of data that distributes unified data services across multiple users and applications. This gives users quicker access to all data, cuts down on replication, reduces costs, and provides data flexible to change.
Though it performs like traditional data integration, DV uses modern technology to bring real-time data integration together for less money and more flexibility. DV has the ability to replace current forms of data integration and lessens the need for replicated data marts and data warehouses.
Data virtualization can seamlessly function between derived data resources and original data resources, whether from an onsite server farm or a cloud-based storage facility. This allows businesses to bring their data together quickly and cleanly.
How Virtualization Works
Most people who use IT are familiar with the concept of data virtualization. Let’s say you store photos on Facebook. When you upload a picture from your personal computer, you provide the upload tool with the photo’s file path.
After you upload to Facebook, however, you can get the photo back without knowing its new file path. Facebook has an abstraction layer of DV that secures technical information. This layer is what is meant by data virtualization.
When a company wants to build Virtual Data Services, there are three steps to follow:
- Connect & Virtualize Any Source: Quickly access disparate structured and unstructured data sources using connectors. Bring the metadata on board and create as normal source views in the DV layer.
- Combine & Integrate into Business Data Views: Integrate and transform source views into typical business views of data. This can be achieved in a GUI or scripted environment.
- Publish & Secure Data Services: Turn any virtual data views into SQL views or a dozen other data formats.
Once a DV environment is in place, users will be able to accomplish tasks using integrated information. A DV environment allows for the search and discovery of information from varied streams.
- Global Metadata: Global information search capability lets users access data through any format from anywhere in the world.
- Hybrid Query Optimization: Allows for the optimization of queries, even with “on-demand pull and scheduled batch push data requests.”
- Integrated Business Information: Data virtualization brings users integrated information while hiding the complexity of accessing varied data streams.
- Data Governance: DV layer serves as a unified layer to present business metadata to users. Simultaneously, it helps to understand the underlying data layers through data profiling, data lineage, change impact analysis and other tools and expose needs for data normalization / quality in underlying sources.
- Security and Service Level Policy: All integrated DV data views can be secured and authenticated to users, roles and groups. Additional security and access policies can manage service levels to avoid system overuse.
Data Virtualization Tools
The various capabilities that Data Virtualization delivers offers companies a newer, faster method of obtaining and integrating information from multiple sources. The top tools currently in use are as follows:
- Logical abstraction and decoupling
- Enhanced data federation
- Semantic integration of structured & unstructured data
- Agile data services provisioning
- Unified data governance & security
These capabilities cannot be found organized in any other integration middleware. While IT specialists can custom code them, that minimizes the agility and speed advantages DV offers.
Data Virtualization creates many benefits for the companies using it:
- Quickly combine multiple data sources as query-able services
- Improve productivity in IT and by business data users (50%-90%)
- Accelerate time-to-value
- Improve quality and eliminate latency of data
- Remove the costs associated with populating and maintaining a Data Warehouse
- Significantly reduce the need for multiple copies of any data
- Less hardware infrastructure
While this innovate new path to data collection and storage offers increased speed and agility, it is important to note what DV is not meant to be.
What Data Virtualization is Not
In the business world, particularly in IT, there are buzzwords flying about in marketing strategies and among industry analysts. It is therefore important to make note of what Data Virtualization is not:
- Data visualization: Though it seems similar, visualization is the physical display of data to users graphically. Data virtualization is middleware that streamlines the search and collection of data.
- A replicated data store: Data virtualization does not copy information to itself. It only stores metadata for virtual views and integration logic.
- A Logical Data Warehouse: Logical DWH is an architecture, not a platform. Data Virtualization is technology used in “creating a logical DWH by combining multiple data sources, data warehouses and big data stores.”
- Data federation: Data virtualization is a superset of capabilities that includes advanced data federation.
- Virtualized data storage: VDS is database and storage hardware; it does not offer real-time data integration or services across multiple platforms.
- Virtualization: When used alone, the term “virtualization” refers to hardware virtualization — servers, networks, storage disks, etc.
Myths and Inaccuracies
As with every new innovation in technology, there will always be myths and inaccuracies surrounding implementation.
We don’t need to virtualize our data – we already have a data warehouse.
The sources of unstructured data increase every day. You can still use your data warehouse, but virtualization allows you to tie in these new sources of data to produce better information and a competitive advantage for your business.
Implementing new data technology isn’t cost effective.
Data virtualization software costs are comparable to building a custom data center. DV also does not require as many IT specialists to use and maintain the system.
Querying virtual data can’t perform like physical data queries.
With the constant innovation and improvement of computing platforms, faster network connections, processor improvements, and new memory storage, virtualization software can process queries with multiple unconnected data sources at near real-time speeds.
Data virtualization is too complex.
When something is new in technology, humans have the tendency to question it based on their own lack of experience. Most virtualized software is easy enough to be used by geeks and laymen alike.
The purpose of data virtualization is to emulate a virtual data warehouse.
While DV can work this way, it is more valuable when data marts are connected to data warehouses to supplement them. “The flexibility of data virtualization allows you to customize a data structure that fits your business without completely disrupting your current data solution.”
Data virtualization and data federation are the same thing.
Data federation is just one piece of the full data virtualization picture. Data federation can standardize data stored on different servers, in various access languages, or with dissimilar APIs. This standardizing capability allows for the successful mining of data from multiple sources and the maximizing of data integration.
Data virtualization only provides limited data cleansing because of real-time conversion.
This is a claim that can be made about any number of data query software programs. It is best to clean up system data natively rather than burden query software with transformation of data.
Data virtualization requires shared storage.
Data virtualization is quite versatile. It allows you to build customized storage devices for your system needs.
Data virtualization can’t perform as fast as ETL.
Through data reduction, data virtualization performs more quickly than ETL. “Operations perform at higher speeds because the raw data is presented in a more concise method due to compression, algorithmic selection and redundancy elimination.”
Data virtualization can’t provide real-time data.
DV sources are updated live instead of providing snapshot data, which is often out of date. “It is closer to providing real-time data and faster than other data types that have to maintain persistent connections.”
Why Do We Need Virtualization?
Data is transferred among users in different speeds, formats, and methods. These variables make Data Virtualization a must have in the global business world. DV will help companies search, collect, and integrate information from various users, platforms, and storage hubs much more quickly. This will save the company time and money.
Data Virtualization is perfect when data demands change on the fly and when access to real-time data is critical to positive business outcomes. DV also provides you with access to any data storage system you are currently using. Despite the differences in storage platforms and systems, DV will allow you to integrate all the material in a single model.
Data Virtualization offers help in security challenges because the data is not transferred – it is left at the source as DV provides virtual access from anywhere. This is also cost-effective as you will not be duplicating any data.
As we move further into the technical age of global systems, the need for Data Virtualization becomes clear. Access to information across platforms, languages, and storage types will precipitate a faster and more useful transfer of data that everyone can use.
The future is here. The future is now.
OLAP and Hadoop: A Great Pairing
OLAP continues to be a relevant and exciting technology, most recently in pairing OLAP and Hadoop. As we are OLAP.com, we have ALWAYS seen the value of OLAP technology. We admit OLAP has been a bit out of style the last few years. Some companies even run Google ads about how “OLAP is obsolete,” but nothing could be further from the truth. (Check out our blog on that one.)
We see this in the fashion industry all the time: what is old is new again! This is rare in the technology realm, but it seems to be the case with OLAP. As developers struggle to get value out of Hadoop data, they discovered they needed the speed and flexibility of OLAP. OLAP and Hadoop is a powerful combination for getting to the ultimate goal of extracting value from Big Data.
Bringing OLAP to scale for Big Data
In an article from ZDNet, Is this the age of Big OLAP? Andrew Brust writes about the new relationship between OLAP and Hadoop. He highlights that OLAP technology can be particularly beneficial when working with extremely large Big Data sets. Typically, OLAP has not been scalable enough for Big Data solutions. But OLAP technology continues to progress, we find this new application of OLAP exciting. Brust discusses a few strategies for bringing the two technologies together. He mentions a few OLAP vendors in detail and how they manage the issue of scalability for OLAP software.
If you want to try using OLAP with Hadoop, perhaps you want to give PowerOLAP, the mature OLAP product of OLAP.com, a try? There is a free version of PowerOLAP available. If you plan to test PowerOLAP with your Hadoop, contact PARIS Tech, and they will lift the member limit for you in the free version, as you will need to go beyond the member limit that ships with the free version.
In sum, OLAP.com is pleased to see OLAP rising in relevance once again and getting some of the recognition we felt it deserved all along. It is a testament to the power and value OLAP has as a technology.
The Power of OLAP and Excel
Should Excel be a key component of your company’s Business Performance Management (BPM) system? There’s no doubt how most IT managers would answer this question. Name IT’s top ten requirements for a successful BPM system, and they’ll quickly explain how Excel violates dozens of them. Even the user community is concerned. Companies are larger and more complex now than in the past; they are too complex for Excel. Managers need information more quickly now; they can’t wait for another Excel report. Excel spreadsheets don’t scale well. They can’t be used by many different users. Excel reports have many errors. Excel security is a joke. Excel output is ugly. Excel consolidation occupies a large corner of Spreadsheet Hell. For these reasons, and many more, a growing number of companies of all sizes have concluded that it’s time to replace Excel. But before your company takes that leap of faith, perhaps you should take another look at Excel. Particularly when Excel can be enhanced by an Excel-friendly OLAP database.That technology eliminates the classic objections to using Excel for business performance management.
Excel-friendly OLAP products cure many of the problems that both users and IT managers have with Excel. But before I explain why this is so, I should explain what OLAP is, and how it can be Excel-friendly. Although OLAP technology has been available for years, it’s still quite obscure. One reason is that “OLAP” is an acronym for four words that are remarkably devoid of meaning: On-Line Analytical Processing. OLAP databases are more easily understood when they’re compared with relational databases. Both “OLAP” and “relational” are names for a type of database technology. Oversimplified, relational databases contain lists of stuff; OLAP databases contain cubes of stuff.
For example, you could keep your accounting general ledger data in a simple cube with three dimensions: Account, Division, and Month. At the intersection of any particular account, division, and month you would find one number. By convention, a positive number would be a debit and a negative number would be a credit. Most cubes have more than three dimensions. And they typically contain a wide variety of business data, not merely General Ledger data. OLAP cubes also could contain monthly headcounts, currency exchange rates, daily sales detail, budgets, forecasts, hourly production data, the quarterly financials of your publicly traded competitors, and so on.
You probably could find at least 50 OLAP products on the market. But most of them lack a key characteristic: spreadsheet functions.
Excel-friendly OLAP products offer a wide variety of spreadsheet functions that read data from cubes into Excel. Most such products also offer spreadsheet functions that can write to the OLAP database from Excel…with full security, of course.
Read-write security typically can be defined down to the cell level by user. Therefore, only certain analysts can write to a forecast cube. A department manager can read only the salaries of people who report to him. And the OLAP administrator must use a special password to update the General Ledger cube.
Other OLAP products push data into Excel; Excel-friendly OLAP pulls data into Excel. To an Excel user, the difference between push and pull is significant.
Using the push technology, users typically must interact with their OLAP product’s user interface to choose data and then write it as a block of numbers to Excel. If a report relies on five different views of data, users must do this five times. Worse, the data typically isn’t written where it’s needed within the body of the report. Instead, the data merely is parked in the spreadsheet for use somewhere else.
Using the pull technology, spreadsheet users can write formulas that pull the data from any number of cells in any number of cubes in the database. Even a single spreadsheet cell can contain a formula that pulls data from several cubes.
At first reading, it’s easy to overlook the significant difference between this method of serving data to Excel and most others. Spreadsheets linked to Excel-friendly OLAP databases don’t contain data; they contain only formulas linked to data on the server. In contrast, most other technologies write blocks of data to Excel. It really doesn’t matter whether the data is imported as a text file, copied and pasted, generated by a PivotTable, or pushed to a spreadsheet by some other OLAP. The other technologies turn Excel into a data store. But Excel-friendly OLAP eliminates that problem, by giving you real-time data for a successful BPM system.
To learn more about OLAP, click here.
“There’s nothing inherently wrong with spreadsheets; they’re excellent tools for many different jobs. But data visualization and data communication is not one of them.” – Bernard Marr
We couldn’t agree more with what Bernard is saying in his article, “Why You Must STOP Reporting Data in Excel!” Excel is everywhere and it has proven to be a valuable resource to every company across the globe. The problem is that many companies are using spreadsheets as their main line of communication internally. Excel is great at displaying all of the raw data you could possibly dream of, just ask any Data Analyst, who eats, sleeps and dreams of never-ending spreadsheets. Bernard gets right to the point and lays out the top 4 reasons that spreadsheets are not the right fit for visualizing data and communication within an organization.
Most people don’t like them.
Bernard makes a great point, unless you work with Excel frequently like a data analyst, it has the reputation of being intimidating. Employees will be reluctant to use it, let alone even think about analyzing data from it. If employees are not clerking in Excel all day, they are most likely going to give Excel the cold shoulder when it comes to communicating data.
Important data is hidden.
I think it is safe to agree with Bernard on this. Spreadsheets are not the best visualization tool out there. Most spreadsheets today are full of endless numbers. If users can’t look at the data and quickly decipher valuable vs. non-valuable, that is a problem. There are better visualization tools that paint a clearer picture and allow for effective communication.
Loss of historical data.
Users in Excel are constantly updating the facts and data as necessary. The downfall to that is it essentially erases all historical data. Without historical data there is no clear way to see the trends and patterns. It takes away the ability to make predictions for the future.
It’s difficult to share.
Spreadsheets are not ideal for collaborative data sharing because they allow the risk of having data deleted or changed. The way that data is shared today is by emailing updated spreadsheets. This data is considered stale or dead, it lacks the key component of remaining “live” or in real-time. This way of sharing is not only time consuming but eliminates the opportunity for users to collaborate while never losing connection to the most updated information available.
The great news is, there’s an easy answer to all of the common frustrations of spreadsheets…
PowerOLAP is an example of a product developed with a solution that addresses all of these problems. It allows for real-time collaboration between users, while always remaining “live”. It has the ability to store historical data which allows for accurate analytical predictions to be reported. Take a deeper look into PowerOLAP and see how it can take your organization to the next level.
To read the entire article by Bernard Marr, click here.
Data planning is quickly becoming a top priority in businesses across the globe. Ben Rossi dives into some key components that are making it vital for organizations to manage their data. According to Rossi’s post, there are two main components that factor into this. The first one is the increasing amount of data that is being pulled into organizations for analysis. As time progresses, so does the high volume of data and it is only speeding up as time ticks forward. Large quantities of data and information is a great thing but in order to retain any value from it, it must be managed the correct way.
Organizations are being faced with tougher compliance policies which is requiring more effort in maintaining data for a much longer amount of time. Not only are businesses overflowing with large quantities of data but now must solve the issue of, where can all of this data be stored. Rossi provides the example of large credit card companies. In the past, they were required to keep the data records of all credit card transactions for seven years. But now there has been recent talk of extending that to 10 or possibly more years.
Data planning can have a big positive impact on a company as a whole, but planning is essential to success. The proper planning ensures that things such as cloud storage and prioritizing levels of data for storage within one’s network are all properly set up. Planning out the process and details for proper employee data access is crucial, too. It is important to figure out the limits and accessibility of data for all employees early on to ensure a positive work flow.
So, is it time to take a step back to re-evaluate just how effectively you are managing your data? What plan do you have in place and more importantly, has there been a positive impact on your business?
Want to read the entire article? Click Here.
What does December 18th, 2015 mean to you? Ok yes, it is one week before Christmas but if you are a Star Wars fan, then you know it means much more than that. The awaited release of “The Force Awakens” has caught quite the buzz around this holiday season. Jim Hopkins does an awesome job at bringing Star Wars and Business Data/Analytics side by side in his article: How Star Wars Can Help with Your Data Problems.
If you are new to Business Intelligence terms such as CRM, Big Data, Analytics, this article does a good job of laying it out clearly.
Hopkins first relates “the Force” to “the Data” in the business world. “The Force” is an abstract power that connects them and controls their world. Which is incredibly similar to “the Data”. What would we be without data? What would we analyze to gain crucial insight on the decisions that are made to strengthen our businesses? The same analogy holds true for businesses’ CRM systems. The CRM (Customer Relationship Management) connects everyone from Marketing, Sales, Finance, Support and Administration.
This leads into Hopkins’ next comparison: “The Dark Side”. Without the proper care and management of a CRM system, it can quickly turn “dark” as Hopkins puts it. Collecting large amounts of useful data is a great thing, having the ability to store and organize this information is also a great thing. However, without close attention to detail many businesses can allow their knowledge of information to turn sour. It is a crucial part to the success of a company, to pay attention to detail and making sure what is being stored is accurate. What is meant to strengthen and prosper a business, can quickly do the opposite if not properly maintained.
When Luke Skywalker is attracted to the Jedi lifestyle, he begins to gain greater knowledge of his father’s past, a Jedi master. Through this, he is able to strengthen the power he holds within himself. In the business world, we must truly understand what our goals are and how our decisions are impacting them. When we run analysis reports, what patterns do we see? What story is the data telling us?
As Darth Vader said about Luke Skywalker, “the force is strong with this one.” Hopkins relates this to having a strong set of policies and mechanisms in place when implementing any data analytic strategy or program. Everyone must be on the same page and follow the same guidelines to ensure the highest quality of data outcome within a business.
Click Here to read entire article.