DefaultRecordSeparatorPolicy and Unterminated Double Quotes
Last
winter I got the opportunity to lead the development of a batch
application that would run on our mid range servers. This gave me the
opportunity to explore using Spring Batch to load large flat files from
clients into our system. The exposure would be small since we were
really only loading a few files. The process had been running great
until last week.
Last we we got a file that cause an OutOfMemory error when it was processed. After looking at the issue it was noticed a double quote was in the name field on the file. Once I removed that single double quote the process successfully loaded the data in a test environment. Now the file contained double quotes in other parts of the file so I didn't understand why this one caused the file to fail.
After attaching the source for Spring Batch I was able to walk through the code to see what was going on. Eventually I got into my FlatFileItemReader and found it referencing a RecordSeperatorPolicy to look for the end of a record. The DefaultRecordSeparatorPolicy will look for an unterminated double quote and if found will basically puke on the record to the point where it will never find an end of record. Since this is the default that really is a problem if your client send you a miscellaneous double quote.
The solution was to use a different RecordSeparatorPolicy in my FlatFileItemReader class. Thankfully Spring Batch offers another class called SimpleRecordSeparatorPolicy which doesn't care about a end of line marker. After making this change in my code I was able to load the original file in my test environment with no issues. I'm wondering if this has been noticed by others using Spring Batch. I think this really makes a case for testing corrupt files in the QA phase just to see what would possibly happen.
Last we we got a file that cause an OutOfMemory error when it was processed. After looking at the issue it was noticed a double quote was in the name field on the file. Once I removed that single double quote the process successfully loaded the data in a test environment. Now the file contained double quotes in other parts of the file so I didn't understand why this one caused the file to fail.
After attaching the source for Spring Batch I was able to walk through the code to see what was going on. Eventually I got into my FlatFileItemReader and found it referencing a RecordSeperatorPolicy to look for the end of a record. The DefaultRecordSeparatorPolicy will look for an unterminated double quote and if found will basically puke on the record to the point where it will never find an end of record. Since this is the default that really is a problem if your client send you a miscellaneous double quote.
The solution was to use a different RecordSeparatorPolicy in my FlatFileItemReader class. Thankfully Spring Batch offers another class called SimpleRecordSeparatorPolicy which doesn't care about a end of line marker. After making this change in my code I was able to load the original file in my test environment with no issues. I'm wondering if this has been noticed by others using Spring Batch. I think this really makes a case for testing corrupt files in the QA phase just to see what would possibly happen.


Comments