2014年4月15日 星期二

Industry voice: Making the most of data: from experimentation to action

Industry voice: Making the most of data: from experimentation to action

Wherever you look, there is no shortage of statistics or analysis pointing to the global explosion in data growth. According to CSC Insights, data production is expected to be 44 times greater in 2020 than it was in 2009, with business data volumes doubling globally every 1.2 years.


However, the problem in making the most of this increasingly valuable asset is not the larger volume of data but the complexity of getting most value from it. Most of this growth is from new forms of data – such as social media content, images, video and sensor data – often generically categorised as 'unstructured' data, because they don't follow a neat row-and-column format typically used for storing and analysing data.


Additionally, the optimal value of these complex sources can only be realised from the application of new, unfamiliar types of analysis.


Reaction times


Not surprisingly, companies are reacting to these dramatic changes, to take advantage of this tremendous opportunity for business improvement. As a result, big data is moving decisively to the top of the boardroom agenda.


However given the complexity of the topic action taken is often haphazard, without a clear direction or strategy, resulting in lost opportunities and a slow realisation of potential benefits.


A recent Teradata poll of European companies found that almost one half (47%) are already running big data projects or plan to within the next two years. And momentum is growing – even through governmental support - for example, the European Commission is funding a Big Data Public Private Forum (BIG) designed to engage all stakeholders in advancing the big data debate.


In the US, larger firms have advanced even more rapidly. In 2009 there were only a small number of big data projects, worth just $100 million, yet today more than 90 per cent of Fortune 500 companies have some type of big data initiative underway.


Given that the growth in data is predominantly driven by new 'unstructured' data sources, there is also a significant impact on the methods employed to store and analyse this asset. This is mirrored by the growing interest in new storage frameworks, especially open source solutions, such as Hadoop.


Hadoop – moving beyond experimentation


As a first step in big data, many businesses have embarked on an exploration of Hadoop, attracted by the concept of downloading free open-source software on low-cost commodity servers to improve their ability to effectively analyse data within the business.


Yet this approach is not without risk. First, to start with the solution is to look through the wrong end of the telescope. Instead, the organisation should first consider the business problems to be addressed and then outline an appropriate response.


Second, any development should be subject to rigorous and continuous analysis as to whether it is working and fit for purpose as the best solution to the problem.


Having said that, Hadoop does offer a number of unique benefits to the business. As a large distributed file system, it allows the organisation to acquire and store large volumes of semi-structured and unstructured data cost-effectively. As a result, it is increasingly being perceived as a highly-efficient long-term data storage platform.


Hadoop is also an efficient way of sequentially processing files. This is especially valuable for pre-processing tasks such as preparing web logs for loading into a data warehouse.


However, as a traditional batch process tool, Hadoop is less efficient than a traditional data warehouse in handling queries requiring data across different files, and can only support a small number of user queries at a given time.


So where does that leave us? Those businesses implementing Hadoop typically find it quick and easy to store massive volumes of different data types and do much of the initial data manipulation and preparation required. However, they quickly recognise the limitations of running analytics in this environment – the truth is that there is no single silver bullet for the wide variety of analytics needed today.


Therefore, Hadoop is best adopted as part of a matrix of technologies which enable businesses to expand the boundaries of analytics and realise the full potential of their data.


This capability is known as a Unified Data Architecture – in this scenario, Hadoop provides a data lake capability, to source and store unlimited data volumes and types. Furthermore Hadoop is invaluable in preparing information for analytics most commonly undertaken in a data warehouse.


Data Warehouses have evolved to become core business information engines, providing a robust solution that supports analytics across hundreds or thousands of users, with a predictability of response time and reliable availability that ensures business operations can rely on analytics to drive tactical business decision.


Hadoop compliments a data warehouse by lowering the cost of data acquisition and preparation – a key element to the economics of data management.


Increasingly this ad-hoc analytics that has traditionally occurred in a data warehouse does not go far enough to support the wide range of discovery analytics that organisations want to undertake on new sources and with new types of analytics.


Often the data required for this discovery activity is held in raw from in Hadoop and an exploratory approach is required to unravel the structure and identify the elements of value.


A data warehouse is not the most efficient place to do this, but neither is Hadoop – most organisations lack the data scientists that are needed to write the complex coding in Hadoop.


Increasingly organisations are creating a 'Discovery' environment, which provides user-friendly access to a growing range of analytical techniques. Optimally this environment must be integrated with Hadoop and their existing Data Warehouse to simplify data management.


The next article looks at how fail fast 'discovery' adopts an iterative approach to problem solving as the basis of a more responsive and flexible development strategy.



  • Duncan Ross is the director of data science at Teradata UK. He works across the International Area and in all industries. Current areas of focus include social network analysis, social media analytics, the integration of big data with transactional data, and driving business decisions through analytics.




















from Techradar - All the latest technology news http://ift.tt/1il15MC

沒有留言:

張貼留言