Wednesday, 13 February 2013

Challenges in Production Cloning approach

In my previous articles, I have already discussed the topics "How to create Test Data" and "Top 3 Challenges in using Production data in Test Environments".  In this post we will focus on the challenges that we face in Production Cloning approach and how to overcome those challenges.

1.  Infrastructure

Even though it is highly recommended to have the Test Environment in the same lines as Production, it is not always feasible to test under those real-time conditions.  It is highly recommended to do Performance / load / stress tests exactly mimicking the Production database, but the expensive infrastructure requirements might be an overkill for Functional Testing.  But cloning might force you to have production like infrastructure which will translate into higher costs for the customer.

2.  High Storage Costs

Another major challenge associated with Production Cloning is that all the production data needs to be stored in testing region.  Assuming the production data is 50 TBs (Terabytes), the Test Database also needs to hold 50 TBs of data.  So storage has to be provided for storing all of the data.  And with the databases being backed up regularly, that would mean higher storage costs for the customer.

3.  Increased Load Time

Time taken to load 50TB data from Production into Test database will obviously take longer than any smaller amount of data.  And you will be also locking the Test Database during the entire load operation to avoid deadlocks and contention.  So the faster the data gets loaded into the Test database, the better it is.  Sometimes due to delay in development schedule, the product or application will be pretty late to hit the testers desks.  In such cases, higher load time will eventually leave the team with no time to test all the features.

4.  Multiple Test Environments

You might have several Test Environments to load data into.  For ex.  You might have an environment for QA, UAT, Performance and Dev teams.  One of the largest banks had around 8000 Test environments to work with.  Multiple environments will increase the infrastructure requirements, Load Time and storage costs and will directly multiply the associated costs.

So how to overcome these challenges?  The answer lies in a technique named simply Data Subset.  In this technique, instead of fetching the entire data from Production, we will fetch a well defined structurally valid subset of data from the Production database and load them into the Test Database.  For ex. in the same banking scenario, we might only take a portion of the production database (Lets say transactions of customers which have been created in the last 5 years instead of all the transactions).  That way we effectively reduce the volume of data in the Test Database.  Please read my detailed post on Data Subset for more information.

About the Author

Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing.  He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant.  He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing.  He blogs at Test Data Management Blog & Agile Blog.  Connect with him on Google+


  1. There could be another challenge too. Even if we clone the production environment, chances are the staging might not always get all the data it would need to work.

    ie: There are times when the production gets data from third party websites which might not be available in test environments. It could be xml, csv, json or any other filetype. And the format might vary depending on scenarios. In such cases, we might need to create some mocks for them and configure the test environment accordingly as well. 

  2. Thanks Sreenath for your comments. Yeah, thats very true that we can't exactly replicate the production sometimes due to the complex data sources / third party dependencies or we might need to generate new data as well. I am planning to cover these topics in a separate blog post so that readers get more clarity.

    - Rajaraman R
    TDM Blog
    Agile Blog

  3. any body know the sub setting process in the db?