Thursday, 7 March 2013

Data Archive in Test Data Management (TDM)

In the previous posts, I explained about Data Subset, Data Masking, Test Data Ageing and Test Data Refresh.  In this post, we will focus on the topic of Data Archival and how important it is to the process of Test Data Management.

What does Data Archival typically mean?
  • Size Management
    • You would want to provide an efficient mechanism for the database size management.  Over time a database size grows and you need to actively manage it.
  • Archival of older data
    • Older data can be archived to some low disk space occupying area and can be later retrieved whenever needed

Types or Archive Mechanisms:
  • Live Archive
    • This is the mechanism of archiving a live production data so that the production database is not disturbed.  This mechanism is typically used for Production databases which contain a lot of transaction data but may not need every data for its routine processing.  In this case, the data is archived to preferably another database so that the Production database size doesn't grow beyond limits
  • File Based Archive
    • In this mechanism, the data to be archived is converted to a proprietary format (text or preferably binary) so that it can be stored for future use.
Challenges in Data Archive

  • Security
    • Since data from the database is archived and stored in file formats, utmost care should be taken to secure the file contents
  • Compression
    • Since one of the objectives is to reduce the storage costs, effective data compression algorithms needs to be used in order to compress the data files
  • Data sources
    • The data can come from multiple data sources, and need to be restored to multiple data targets.  Hence it is also necessary to take this into account while designing a data archival solution (or) using a commercial data archival solution.
  • Data Relationships (Yet again)
    • We generally archive only a portion of the data depending on the need.  Hence effective sub setting techniques should be used to maintain data integrity and the same needs to be maintained when restoring the data.
Out of these challenges, security doesn't apply to TDM because you are dealing with Test data in here.  But the other challenges do get applied.


Data Archive in the Scope of Test Data Management

  • Maintenance of test data
    • Typically used in maintenance of test data over a period of releases.
    • Archival of older release data
      • You can always archive your older release test data so that it remains intact for future use
    • Archival of multiple environment's test data
      • If there are multiple test environments, the test database size grows proportionate to the number of environments.  In this case, archiving the data would save a lot of disk space.
    • Restore whenever necessary
      • An archive should be easily restorable.  
    • Release/Build/Cycle wise snapshots for easy restore
      • Snapshots can be maintained as per the project release cycles.  This is useful in case of production support wherein, we would need an older environment for testing the production support release.

    In summary, we have seen what is data archival, its challenges and its application in Test Data Management.  Hope you found the post useful.

    Do you think there are any other challenges that you see with data archive in the scope of Test Data Management?  Please feel free to comment.


    About the Author

    Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing.  He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant.  He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing.  He blogs at Test Data Management Blog & Agile Blog.  Connect with him on Google+

    No comments:

    Post a Comment