Showing posts with label data sampling. Show all posts
Showing posts with label data sampling. Show all posts

Friday 15 February 2013

Data Subset in TDM

In my previous post, I discussed the Challenges in Production Cloning approach.  In this post, we will focus on its solution, the Data Subset process / Data Sub-setting.

Data subset is the process of slicing a part of the Production Database and loading it into the Test Database.  For ex. instead of cloning a 50 TB production database, create a subset that is only 50 GB worth data and put it back into the Test Database.  Lets assume in a retail application, you have a Customers table having 10 million customers and Orders table having 100 million orders and 100 million other transaction tables, our subset process will try to shrink the sizes to good reasonable limits as depicted in the picture below.















Advantages of data sub-setting

Wednesday 13 February 2013

Top 3 Challenges in using Production data in Test Environments

In my previous post "How to create Test Data", I explained the concept of creating test data directly from the production data.  In this post we will concentrate on the Top 3 challenges in using the Production data for testing purposes.

Data Security

This is by far the most crucial challenge of using Production data in Test Environments.  Production data can contain a lot of sensitive information.  Even though the data sets will be rich in nature in the Production database, the very thought of using production data involves a lot of risk.  For ex. if you are testing an application for a bank, production data will contain real customer information like Names, Addresses, Account Numbers, Balances, Credit Card Numbers, etc.  As you can see, if you try to use these data for testing, it exposes huge security risks for the bank. So how do we overcome this, the answer is Data Masking.

Data Masking is the process of masking of the sensitive fields from the complete data set.  Please read my future post on Data Masking and the Techniques used for Data Masking for more details.  The following figure depicts the data security challenge and the approaches.

Data Security Challenge


Saturday 9 February 2013

TDM Topics to be covered in this blog

Hello all,

The intention of this blog is to share my insight and knowledge in the area of Test Data Management.  I am looking forward to write a few posts in the following topics.  I will write those whenever I get some free time.  Thanks.



Regards
Rajaraman R