The rise of self-service analytics

As SSA gains momentum, the need for data governance increases in order to drive true business value.

Data governance offers a collaborative framework for managing and defining enterprise-wide policies, business rules and assets.

Data governance offers a collaborative framework for managing and defining enterprise-wide policies, business rules and assets. Image © marigranula |

By Paul Brunet

With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is, users too often can’t find the data they need to perform desired analytics. Data tends to be buried in different systems or siloed in departments across the organization.

This data chaos and uncertainty is costing businesses big money – as much as $3.1 trillion, according to a recent Harvard Business Review study. In addition to the time wasted on searching for the data, individual interpretation of the data through a subjective lens can result in inconsistencies that adversely affect a company’s business.

Making Trust a Priority for Reliable Analytics

The industry has seen a surge of self-service analytics (SSA) tools such as Tableau and Qlik that enable analysts and non-technical business users to gain insights and drive data-focused initiatives. SSA and business intelligence (BI) empowers knowledge workers and business users to gather desired insights without reliance on IT to run reports.

However, investing in analytics tools alone can’t deliver business value. For a SSA tool to do its job, companies need to ensure that the people using the tool can easily access the data they need across the organization – including siloed data living in various systems – and have full confidence in this data to apply it for greater business insight and results. Effectively integrating disparate data from different systems and devices requires a complete understanding of the organization’s “data map” and the data’s journey and relationship to other similar – or sometimes contradictory – data throughout the organization. This is best achieved through data governance.

Data governance offers a collaborative framework for managing and defining enterprise-wide policies, business rules and assets to provide all business users with high-quality data that is easily accessible in a consistent manner. By adopting an overall policy through governance, users can determine data inventory, data ownership, critical data elements (CDE), data quality, information security, data lineage and data retention so they have a good understanding of the data across the organization and its meaning. True data governance breaks down data silos so users can find the trusted data they need, collaborate on it, and easily understand it so it’s consumable to drive competitive advantage. This is the new order of data governance today.

Ensuring data trust has become one of the most critical factors to driving successful BI initiatives. When users know they can trust the data, they are more likely to use it for business insight. And this element of trust becomes even more critical when we look at automated analysis through machine learning, a growing trend that offers great business value. The ability to sift through large volumes of data and draw conclusions can move a business forward. But in a business world where the volume of data has become increasingly massive, it’s impossible to manage this without automation.

Empowering Data Discovery for Greater Insight

Data governance is critical to the success of self-service BI models by providing consistent and reliable data across the organization in a unified form that algorithms can understand. Additionally, business users need to know how to best explore this data without relying on IT’s hand-holding for optimal discovery and insight.

Initial user training of analytics tools is essential before getting started on any project. But even for those who possess a good understanding of how to use these tools, agreeing upfront on definitions and KPIs of BI models is essential, as is knowing whether certain reports already exist before duplicating efforts. This is another common bottleneck to leveraging SSA tools’ full potential – research shows 70 percent of a data analyst’s time is spent on preparing and analyzing data on questions already answered. The more visibility and understanding users have of the data, the more informed decisions they can make regarding which models to explore.

Consumerizing Data Discovery and Analytics via a Data Catalog

Deploying a data catalog as part of the data governance solution provides business users with a more strategic and simplified data overview. A data catalog organizes useful collections of data across existing boundaries. Whether those boundaries are systems, organizations or geographies, this cross-boundary visibility drives many of data’s more significant insights. This broader understanding of data through governance and a catalog empowers the user with a consumable data experience for capitalizing on analytics. Users can harness the expertise of data citizens around the organization, too, and use the catalog infrastructure to enable them to share their work. This gives the user a clear idea of which reports already exist and results in more effective BI analytics and reporting.

Through a catalog, business users can easily find (or shop) for the trusted data they need from one central location, just as they do on consumer sites such as Amazon. A catalog automatically links business terms from the business glossary to registered tables and columns, and leverages the organization’s agreed-upon vocabulary, providing users with a better understanding of the data’s context. This helps users determine whether that data is a good fit for the analysis in question. And different views of the data provide different aspects, which feeds into the analysis.

If the data for which users are searching is not properly cataloged, the self-service tools will not yield valuable results. A data catalog incorporated with machine learning delivers even greater insight and makes recommendations based on past user behaviors and “data purchases,” much like Amazon does for frequent shoppers. A catalog makes it easier and quicker for users to find the data for decision-making, but also enables users to define models earlier in the process. This is particularly helpful for making changes on the fly, as in the case with last-minute requests in definitions or KPIs.

To gain full value of SSA tools, the data catalog should support five capabilities:

1. Consumability. Choosing a data catalog and self-service analytics tools with a user-friendly interface is important to business users who may not be tech savvy. Simple drag-and-drop functionality, intuitive mapping and navigation, and easy-to-read help sections are imperative. The data catalog should go beyond structural and usage metadata and provide an easy “data shopping” experience with the complete meaning, lineage and relationships of consistent and trusted governed data for the business user to capitalize on analytics for business advantage.

2. Business modeling. A catalog with out-of-the-box operating models complements self-service analytics tools by providing a flexible structure for consumable information about any type of data. This functionality then links to the data sources, business applications, data lakes, data quality systems, and all sources of metadata to create a responsive system – essentially aligning the data to the business. These connections enable changes to be detected and policies immediately applied, without manual steps, driving active data governance. The operating model feature enables the business user to create analytics models with specific definitions and KPIs, and easily search for data while having a full view and understanding of the data, how it differs and where it comes from, and trust that information because the data is linked to the data governance platform.

For example, when running analytics reports on customer loyalty, the data collected from website interaction and that of financial transactions on the backend can offer vital clues to customer buying behaviors, even when they have different meanings. This knowledge and understanding of the different but trusted data and meanings promotes data consumability and drives accurate insights and predictions.

3. Collaboration. After users have created their analytics models and prepared their data, they need the ability to explore the data in a way that suits their needs and objectives. First, they need to ensure algorithms are applied correctly and business rules are added, but also critical to the tools’ interaction is collaboration. A data catalog integrates the work of colleagues who may be looking for similar analyses, and can point the user to data sets they may have already created. This simplifies the process of finding relevant and comparable data and saves time if similar reports have already been developed. Because users can tag, document, and annotate data sets in the data catalog, the data is continuously enriched, increasing its value, eliminating data silos and encouraging collaboration and crowdsourcing.

4. Trust. Because policies and rules are applied to the data through governance and because data owners have been assigned and changes/updates are made consistently, users can feel confident knowing the data they’re using (including shared data) for analytics is data they are allowed to use, and/or has been previously approved, and falls under data privacy regulations. This is particularly essential when considering today’s increased regulatory environment and the nature of analytics, pulling data insights from different sources (including customer information) to apply them in external efforts.

5. Machine learning. By incorporating machine learning functionality via semantic search capabilities, the catalog can serve up increasingly relevant data to users over time and offer an automated and efficient way to improve data searches to be used in analysis. This is the Amazon-like feature mentioned earlier, “consumerizing” the user’s data discovery experience and analytical capabilities.
As organizations adopt more self-service tools for BI and expand their analytical capabilities, leveraging a data catalog with these capabilities tied to data governance will give them confidence in knowing their business insights are based on trusted data. This is when we’ll start to see the true value of SSA tools in helping to drive business forward.

Paul Brunet is vice president of product marketing at Collibra. An earlier version of this article appeared in Analytics magazine.