Avoid These Mistakes on Your Data Warehouse and BI Projects


Click to learn more about author Wayne Yaddow.

Data warehousing (DW) and business intelligence (BI) projects
are a high priority for many organizations who seek to empower more and better
data-driven decisions and actions throughout their enterprises. These groups
want to expand their user base for data discovery, BI, and analytics so that
their business users make informed decisions. At the same time, users are
demanding high-quality and often complex BI reports.

Data warehouse projects are highly intricate and
fundamentally risky. Among their many tasks, project managers who lead the data
warehouse team must identify all Data Quality risks. A primary goal of this
process is to document essential information relating to project risk.

This article focuses on how to avoid the following four common
mistakes that other data warehouse and BI projects have experienced, in order
to plan and implement new functions and capabilities with success.

Failing to Introduce Quality Assurance Early in the Project

During the initial phases of data warehouse/BI projects, the
focus is often on BI requirements and data-related needs to build the
operational data store, the enterprise data warehouse, and application reporting
infrastructures. Somehow, the importance of end-to-end DW project testing and Data
Quality is often overlooked.

There is always an appreciation of Data Quality. Still, as data warehouse requirements and design progress, the overwhelming focus on Data Modeling, data capture, and ETL design may cause the team to lose focus on Data Quality. Eventually, issues such as these will arise:

  • Target data is not
    reconcilable with sources.”
  • “Duplicate data abounds.”
  • “Aggregations and
    report drill-downs are not correct.

Ultimately, the success of a data warehouse is highly
dependent on the ability to plan, design, and execute a set of tests that
expose early and ongoing issues: issues with data inconsistencies, Data Quality,
data security, the ETL process, performance, business-flow accuracy, and the
end-user experience.

Many data warehouse teams debate when to start testing as
they’re developing new software. For most DW projects, software testing should
begin as soon as the design and requirements are baselined. An early start to
QA provides several advantages that improve the overall efficacy of
software testing. QA participation at the beginning of a project makes testers
more effective throughout by enabling them to learn about the product and
business rules they will be testing. They will likely design better test plans
and test cases.

During the design and requirements phase, testers can work
with developers to determine what aspects of a design are testable and what
areas will have a higher risk. This knowledge will help prevent testing errors
and better equip testers to design test cases and identify defects.

Implementing a successful data warehouse project is challenging. It requires a balance of many factors, such as strong business involvement, thorough data analysis, a scalable system, Data Architecture, a comprehensive program, Data Governance, high-quality data, use of established standards and processes, excellent communications, and project management.

Failing to Adequately Profile and Validate Source Data Before Loading to the Data Warehouse

Analysts’ studies have consistently shown that over 75% of
data warehouse and data integration project teams have either overrun their
schedules or their budgets or otherwise experienced project failure. Why the
high failure rate?

Inadequate source Data Quality is the root cause of failure
across a wide variety of data warehousing projects. Profiling and validating
all source data upfront can generate significant benefits.

The traditional approach to data warehouse projects follows
these basic steps:

  1. Analyze
    the business, user, and the project’s technical requirements.
  2. Analyze
    the available internal and external data sources.
  3. Identify
    and analyze a set of data sources from legacy systems, operational
    systems, and external sources to determine their relevance to the
    requirements of the target database.

It may be folly to assume knowing your source data before
beginning to design your target data warehouse. The main weakness in the
traditional data integration approach is that it assumes that the data required
for an application is fully available from the data sources. Major corporations
have spent millions of dollars on data integration projects only to learn later
that the source data did not support the target model.

Data profiling should be conducted for each data source:
implement table analysis, row and column analysis, primary and foreign key
assessments, and cross-table analysis. Source data profiling should also be
considered to discover minimum, maximum, mean, mode, percentiles, and duplicate
values – even profiling metadata such as data types, data lengths, null values,
and string patterns. 

Giving Inadequate Attention to Test Automation

With the advent of DevOps for data warehousing,
organizations release new applications faster than ever – sometimes on demand
or multiple times a day. However, numerous businesses are still using manual
ETL test processes for highly visible or customer-facing applications. That
translates into a risk to customer loyalty, the brand, confidential data, and
worse. Even with new automation tools entering the marketplace, ETL and data
profiling testing today continues to be primarily accomplished with manual

Automating ETL tests allows frequent smoke and regression
testing without much user intervention. Automated testing of trusted code after
each new database build can save measurable time and costs.

A decision to implement automated tools for data warehouse
testing depends on a budget that supports additional spending to meet advanced
testing requirements. If implementing vendor-supplied test automation is deemed
cost-prohibitive, it is essential to consider test tools built and maintained
in-house because they are likely to be of a more significant advantage than no
test automation at all.

When developing scenarios for test automation, evaluate your
complete set of test scenarios to determine the best candidates for automation
based on risk and value (ROI): Which types of defects would cause you to stop
an integration or deployment? Which types of tests exercise critical, core
functionality? Which tests cover areas of the application that have been
historically known to fail? Which tests provide information not already covered
by other tests in the pipeline?

In the end, test automation saves time and money, and more
importantly, business users will appreciate the quality of BI deliverables and
accept the data from the data platform solution as a “single version of the

Implementing Flawed Project Change Management Controls

Change is constant on any data warehouse project. Regardless
of industry, there eventually comes a need for new requirements and other
changes. A drive for continuous improvement and new data warehouse requirements
commonly initiates a variation in the project scope or deliverables.

Change management is an essential component in the success of data warehouse initiatives, but how often is comprehensive change management minimized? According to Forrester’s Q1 2014 Global BI Maturity Survey, fully half of those surveyed believe that their processes for managing change based on new data warehouse data and functionality are not well established and do not function smoothly. 

Testers need to reference critical documents involved in all
changes – documentation such as business requirements, design and technical
specifications, data mapping documents, ETL job flows, and more. The ability to
identify and link these documents to the overall change management process and
to prepare test plans are critical for effective quality assurance.

Change for BI initiatives often comes from multiple sources,
including business owners or other stakeholders’ requests and impacts from
source system changes. The QA team should participate in how changes are
logged, managed, prioritized, and updated, and then should assure that all
changes are verified. If your organization already has a change-tracking tool,
it is a good idea to make use of it.

Data warehouse and BI/analytics projects tend to budget for
technology and implementation, but change management and post-go-live adoption
activities are often underfunded – or even overlooked entirely. Good change
management facilitates communication from the outset, motivating users to move
from resistance to acceptance and even excitement – increasing buy-in and
greatly enhancing successful adoption of the new functionality

Patrick Meehan, research director in Gartner Group’s CIO Research Group, states that most failing data warehouse/BI (and DM/A) projects are delayed or go over budget due to poor communication of new and changing requirements between the management and IT teams.

When your organization is rolling out a data warehouse
project, chances of success are greatly enhanced when change management is an
integral part of the initiative.

Final Thoughts

Mistakes described here focus on helping organizations
sidestep QA problems that many other data warehouse projects have experienced.
The tips offered will help ensure satisfaction as data warehouse teams plan and
implement new functions and capabilities. These time-tested recommendations may
save significant money, time, staff resources, and improve results from the
data warehouse application under development.

Credit: Source link