Issue 15621

ArchiveFactory.readFileHeaders misinterprets first line as header row, when it is obviously source values

15621
Reporter: mdoering
Type: Bug
Summary: ArchiveFactory.readFileHeaders misinterprets first line as header row, when it is obviously source values
Priority: Major
Status: Open
Created: 2014-05-19 10:36:18.659
Updated: 2014-05-19 10:36:58.287
        
Description: Moved from https://code.google.com/p/darwincore/issues/detail?id=159

The problem occurs when the header row doesn't contain actual term names - just source values like so:

1	TDWG:NSW	New South Wales

readFileHeaders() interprets TDWG:NSW, and New South Wales as UnknownTerms, and assigns the file a default ignoreHeaderLines = 1.

I wonder if it's not better to say that if the first line contains only UnknownTerms, that this isn't a header row, but rather a line representing source values.

This causes problems for the IPT, because it analyzes individual source files like so:

Archive arch = ArchiveFactory.openArchive(file);

****
What version of dwca-reader am I using?
1.11


Original thread copied as a single comment below]]>
    


Author: mdoering@gbif.org
Created: 2014-05-19 10:36:58.287
Updated: 2014-05-19 10:36:58.287
        
Aug 22, 2012 #1 wixner
Need to check the actual case, but for dwca if there is only a single file it is REQUIRED to have headers. If there is a meta.xml the read file headers method should never be called


Aug 24, 2012  #2 kyle.braak
I can understand that requirement for a dwca. Therefore, perhaps things should be more strict:

If there is no meta.xml, and the file headers method is called, and if it doesn't encounter any known terms,  it throws an error and doesn't accept the file Alternatively, it could accept the file, but treat it as having no headers. What do you think?

To clarify, in the IPT, there is no requirement that the source file uploaded on the resource overview page has to have headers. For convenience in reading in a variety of file formats, we prefer nonetheless to use the ArchiveFactory.openArchive(file) in this case.