19 February 2014 by J-F. Vannier, Business Intelligence Infrastructures Sales Manager, Bull
According to HortonWorks, by the end of 2015, half the data stored in the world will be held in HADOOP infrastructures. Some businesses, like Internet pure players, big retailers and financial companies, who have to manage massive information flows, have already adopted these technologies. For all other organizations, there are questions about the usefulness, relevance and implementation costs for these new technologies. Why adopt HADOOP? For what kinds of activities; what data? What benefits will they get from implementing extensions to their Datawarehouses?
Access to unused data: Information systems are full of untapped data, partitioned databases, archived records and historical data, unused details, departmental databases… Some of that data may still be relevant and useful. HADOOP will enable it to be brought into the Business Intelligence (BI) environment, easily stored at minimal cost and, finally, filtered in order to extract valuable nuggets. This data can be incorporated into the Datawarehouse via this ‘Extended ODS’ (Operational Data Store) without disrupting existing processes, using other tools if necessary, and in large quantities.
Expanding to include new data formats: Obviously, one thinks of data from Web sources. But there are also office documents, emails… disparate sources of information that are either under-utilized or not exploited at all. Because it doesn’t impose data or storage formats, HADOOP facilitates the exchanging of information. Technological barriers to information sharing are lifted, although issues of security, functional consistency between different types of data and discretion within the organization still need to be addressed. But it does provide a technological platform for the elimination of data silos.
Implementing new tools: HADOOP is not just for storage. Various analyses, graphical, document management, ‘machine learning’, statistical, semantic analysis and other tools can be hung onto this framework to enable other ways of analyzing existing data and representing it for users. These new-generation tools will be more easily usable and relevant for users, and give them greater freedom. Lower costs, greater agility: Compared to conventional appliances – based on a similar amount of data and with equivalent performance across a common spectrum of usage – the cost of a solution based on Hadoop is 5-20 times lower. It will also enable greater flexibility in terms of architecture and in the development of new applications. It extends the whole scope of the Datawarehouse. So projects can progressively be enriched, based on agile approaches and ensuring that investment is gradual.
HADOOP has not necessarily been designed to replace current Datawarehouses, which are largely there to support reporting. However, as and when needs change towards being more analytical, towards the requirement to extract greater value from information, it will establish itself as a natural basis for an extended Datawarehouse: so ultimately a Logical Datawarehouse will become essential.