Incremental Submission Feature
Overview
Incremental Submission adds the ability to only upload new data for the current release. All data from previous releases will be maintained by the system and automatically copied to the current release. Additionally, submitters are now able to download their submitted data. This provides a model where submitters are in full control of their effective data set and can use the submission system as a canonical data store.
File Naming
To aid submitters in managing their data as they see best, the dictionary now allows a looser naming convention for files. As a consequence, multiple files per file schema may be submitted. For example, the donor schema now allows submitters to name their files using a custom segment conforming to the following regex:
"^donor(\.[a-zA-Z0-9]+)?\.txt(?:\.gz|\.bz2)?$"
Thus one may choose to adopt a naming scheme such as:
donor.01.txt, donor.02.txt, donor.03.txt
Alternatively one could embed a date:
donor.20130101.txt, donor.20130201.txt, donor.20130301.txt`
With this scheme in place, a submitter can upload donor.20130101.txt
in Release 1, donor.20130201.txt
in Release 2 and donor.20130301.txt
in Release 3. The effective submission will be the combined set of files.
Data Management
It is the responsibility of the submitter to ensure data remains consistent from release to release. In the case of deleted records, one must remove the records and their dependent records from all files and resubmit. The appropriate file split strategies should chosen by submitters to simplify operations between releases and interoperate with existing pipelines.
Notes
In the current implementation each time a validation is performed it will validate the entire data set. However, in combination with _Selective Validation_ the total validation time should be greatly reduced.