Data Warehouse Testing

by Amol 17. March 2011 20:03

What is Data Warehouse?

A data warehouse is a repository where data is stored for analysis, reporting, archival and security purposes. A data warehouse is a system that retrieves data from source systems and puts it into a dimensional data store or a normalized data store. Data warehouse is system capable of having strong Business Intelligence. It is primarily used by the top level management to make business critical decisions. 

Is data warehouse testing important?

Following questions will answer the question.

  • Are decisions based on correct data?
  • Do organization and its employees have sufficient confidence in the new system?
  • How can organization be sure there are no anomalies in the huge amount of migrated data from source to data warehouse?
  • Does it comply with all organization needs?
  • Is sufficient security applied on sensitive data?

To get confident answers for above questions, yes, testing is one of the critical and important part of data warehouse application. 

Challenges in Data Warehouse Testing:

There are many challenges in Data Warehouse testing. Main challenge is due to the huge amount data from heterogeneous data sources.

Other challenges are

  • Complete organization-wide enterprise data knowledge and business knowledge may not be easily feasible.
  • Asynchronously updating of data from heterogeneous sources may result in data inconsistency or may contain partial loading of data into data warehouse.
  • Data quality is not assured at all the sources and no accurate picture of quality of underlying data.
  • 100% data verification will not be feasible unless all extraction, transformation and loading component tested thoroughly to ensure all the data loaded as expected.

Due to such challenges there is difference between normal database testing and data warehouse testing. Below table illustrate difference between database testing and data warehouse testing. 

Database Testing Data Warehouse Testing
Smaller in size Larger in size
Normalized Data De-Normalized data
Usually Homogeneous Data Usually Heterogeneous data is getting loaded.
Usually Data is being tested at source instead of using UI Extraction, Transformation and Loading component are the major one to test
Usually all kind of operations happen Usually read-only operations happen

Data warehouse testing phases:

Following are the main processes involved in data warehouse testing

Requirement Testing:

The main purpose for doing Requirements testing is to check defined requirements for completeness.

For example:

Are the requirements Complete?
Are the requirements Testable?
Are the requirements Ambiguous?
Are the requirements Developable?
Are the requirements Singular?

Unit testing:

Unit testing is usually whitebox testing. It should check the all ETL components,   procedures used in ETL, source and target mappings and transformation for the same, all the scheduled jobs and the reports. This kind of testing is usually done by developers.

Unit testing will involve following

  • Whether the entire source mapping done as per stated and ETL components are accessing right data from right source. 
  • All the data transformations are correct according to the business rules and data warehouse is correctly populated with the transformed data.
  • Testing rejected records that don’t fulfill transformation rules.

Integration testing:

Integration testing involves

  • Initial loading of data into data warehouse.
  • Incremental loading of data into data warehouse.
  • Sequence of extraction objects, transformation objects and loading objects.
  • Sequence of ETLs jobs.
  • Testing rejected records that don’t fulfill transformation rules.
  • Log and error log generation.
  • Record count validation by executing queries against source and target.
  • Data integrity between the various source tables and relationships.
  • Statistical analysis by validation for various calculations.
  • Data Quality Validation to check for missing data.
  • Field by field data verification to check the consistency of source and target data.
  • Validate at the possible lowest granular level. 

Security Testing:

Under the security testing, make sure only the authenticated users should be able to access the resources. Security testing involve 

Confidentiality:
        A security measure which protects against the disclosure of information to parties other than the intended recipient.

Authentication:
        Authentication can be done in many ways like passwords, biometrics, radio frequency identification etc. 

Authorization:
        Authorization determines a requester is allowed to receive or perform an operation.

Performance Testing:

Performance testing verifies that the data warehouse or application can handle the required load and volume. Performance Testing involves

Scalability:
        Will the application handles the expected users load.

Response Time:
        Applications Response time, does the application respond quickly enough under various loads.

Capacity:
        Will the application able to handle defined capacity of data or load.

Throughput:
        For increasing number of concurrent users, high transaction rate should the hardware and software achieves the required throughput without any performance issues.

Acceptance Testing:

Application is tested with full functionality and is expected to function on production.
At the end of UAT, the system should be acceptable to the client for use in terms of ETL process integrity and business functionality and reporting.

Tags: ,

Data Warehouse

Category

Recent Posts

Tag cloud