by Cindi Howson, BI Scorecard
Big Data took New York by storm last week at the annual Strata conference, and coincidentally, on the anniversary of Super Storm Sandy.
The conference was sold out weeks in advance to 3400 attendees. At a conference that drew only 700 people just a few years ago, the attendance reflects the interest, enthusiasm, and innovation in big data.
Some key trends from this year's conference was a dose of big data reality, solutions to help business analysts get to all data, the ease of search, and visualizing big data.
Big Data Grows Up
A few years ago, some people in the big data world were predicting that Hadoop and NoSQL technologies would bring the end to the data warehouse as we know it. Slowly, there seems to be a more pragmatic view of the role that Hadoop has in the enterprise. Facebook Chief Analytics officer Ken Rudin, formerly of Zynga, declared that "big data isn't about the technology—it's about the business needs." He described how Facebook started their analytics with Hadoop, but hassince added a traditional data warehouse to their analytics portfolio. It's all about the right technology for the particular data and type of analytics, and while Hadoop may be great for storing data, it's not so good for the analytics. To that end, MongoDB gave a great presentation on how they see their best fit as in the real-time storage of big data, while Hadoop is better suited to offline data. As examples, two case studies in my just-published book (Successful Business Intelligence 2/E: Unlock the Value of BI and Big Data), reflect exactly these use cases. FlightStats, who tracks thousands of flights in realtime is using MongDB to serve up that information to millions of customers. University of California, Irvine Health, on the other hand, is using Hadoop to store patient medical records, saving hundreds of thousands of dollars versus storing data in the electronic medical records system that runs on a relational database.
Cloudera's Chief Strategy Officer Mike Olson announced Cloudera 5, which acts as an enterprise data hub from which the data warehouse can draw granular data on demand. It includes better security and data lineage. Also announced was Cloudera in the Cloud via its partners, initially Verizon and IBM.
Business Analysts Unleashed
Much of the big data community has initially catered to the statisticians and data scientists, who are skilled in math, statistics, programming, and the business. It's a job that requires a broad range of skills and expertise that is in short supply. In trying to address that short supply of talent, a number of vendors are bringing capabilities to the business power user, but without the need to write SQL or MapReduce jobs. The original BI module that first helped unlock data in relational databases in the 1990s was the business or ad hoc query tool. A semantic view or business meta data layer helped these power users get to their data without knowing SQL, but only once that data was loaded into a data warehouse or mart. But the semantic layer and the data warehouse has gotten increasingly bogged down and maintained by IT. At the same time, power users are trying to get to a broader range of data sources, from Hadoop to relational to XML files, cleansed, transformed, and mashed together on their own terms. These power users are not data scientists who want to code in MapReduce or SQL; they still want to point and click, wherever the data lives. I'm dubbing this new category of tools "Business ELT"
New solutions in this category include Microsoft PowerBI, touted in the keynote, its cloud-based self-service BI solution, currently in community preview. The Power Query module allows a business user to connect to multiple data sources, transform and cleanse the data, and create a view that others can readily consume via native Excel or Power View, its visual data discovery solution. (For a detailed discussion of capabilities, strengths and weaknesses, purchase the review on BI Scorecard).
Meanwhile, Dell acquired Quest Software last year, best known for TOAD that helped DBA's write and optimize SQL queries. TOAD BI, however, is aimed at the business power user, allowing them to mash together multiple data sources and rapidly create subject areas. TOAD BI also has a good differentiator in that it can access an SAP BusinessObjects universe or Oracle BI EE data model, leveraging existing semantic models.
Start-up vendor Paxata show-cased what they describe as the" industry's first adaptive data preparation platform for business users." The tool has built in algorithms so it detects both join relationships between tables as well as potential data quality issues. What's nice about this solution is that they also have partnered with QlikTech and Tableau so that once the business power user has extracted and transformed their data, they can use an established visualization and dashboard front end.
BI Search 2.0
The concept of bringing the ease of use of Google to BI has had a couple attempts in the BI space, going back to 2006 when Google first released Google One Box for enterprise customers. A couple BI vendors were quick to embrace this approach, feeding BI report meta data and in some cases cube structures to the search engine. It all sounds like a great idea. And yet, few customers leveraged this approach for a variety of reasons, some of which have been poor marketing and high licensing costs, and in other cases, difficulty implementing. But we seem to be on the cusp of a second breath of search-based BI. Here, Microsoft showcased its Power Q&A, a module of Power BI, that allows users to ask simple questions like "What are my sales in New Jersey this year?" It returns a dynamically-created, interactive visualization in which the casual user can refine the question and criteria.
Also at Strata was start-up DataRPM, currently in beta and expected to be generally available in the first half of 2014. They also are leveraging natural language processing, not just indexed key words, with two key differentiators. First, they work on top of both structured and unstructured data. Secondly, they embed search within existing processes, not as a separate BI solution. I met with DataRPM at the end of a long day, and they still managed to get a wow out of me. I look forward to seeing how this vendor hits the market.
Although not at Strata, I've also been tracking a start up out of the UK, Neutrino BI who also brings the concept of NLP and search to their dashboard solution. (Catch them live at TDWI in Orlando in my Cool BI class).
With BI adoption still stuck at a paltry 24% of all employees (tell me your BI adoption in this survey, and for a limited time get a free copy of the 2012 report), the ease of use BI search has the potential to bring BI to more mainstream users. It also might unlock data that users currently struggle to find and access.
Visualizing Big Data
There are still two main camps in the visualizing of big data: traditional BI vendors who can access relational data sources as well as Hadoop, and those who access primarily Hadoop. Tableau, QlikTech, TIBCO Spotfire, and MicroStrategy fall into the first category. DataMeer, Platfora, and Karmasphere fall into the latter.
DataMeer has differentiated itself on its ability to generate MapReduce jobs directly, without having to go through a slower HIVE interface that other BI solutions may rely on. It has tripled its customer base in the last year and sees four key use cases: customer analytics, web log file analytics for IT operations, fraud detection, and lowering the cost of data storage. With DataMeer, the data scientist interacts with the Hadoop data via a spreadsheet interface and can then present the results via simple and appealing Info Graphics.
Platfora, in beta at last year's conference, is now generally available and announced version 3 of its product, due in Q1 2014. Platfora creates a type of view, which it calls a lens, to data in Hadoop. Data is loaded into its own in-memory engine where business users can visualize and interact with the data. New capabilities in version 3 allow users to organize the data into events, (such as store visit versus web visit), and the ability to do iterative customer segmentation. Some early users of Platfora include Disney, Netflix, and Shopify.