Issue 18261

Occurrences from 8 datasets at the Freshwater Biodiversity IPT are not harvested

18261
Reporter: peterdesmet
Assignee: mblissett
Type: Feedback
Summary: Occurrences from 8 datasets at the Freshwater Biodiversity IPT are not harvested
Resolution: Duplicate
Status: Closed
Created: 2016-02-25 11:04:05.507
Updated: 2017-10-06 10:11:23.355
Resolved: 2017-10-06 10:11:23.333
        
        
Description: I just noticed that records from these 8 datasets on the Freshwater Biodiversity IPT (http://data.freshwaterbiodiversity.eu/ipt/) are not harvested by GBIF: the DwC-a contains occurrence records, but the GBIF page shows 0 records:

* http://www.gbif.org/dataset/929d10ff-6f80-4ab3-a422-509c6721d402
* http://www.gbif.org/dataset/5221e970-757c-43cb-bdd4-2f085bf36ae4
* http://www.gbif.org/dataset/9c4e36c1-d3f9-49ce-8ec1-8c434fa9e6eb
* http://www.gbif.org/dataset/9a0a4061-c7b5-43f2-abf2-bbd978a686ad
* http://www.gbif.org/dataset/ce0f1750-ad92-46b7-b8c7-59033460de43
* http://www.gbif.org/dataset/6ec2600f-768a-4240-9575-0972fef6c76f
* http://www.gbif.org/dataset/cb9ee548-5c8b-415d-90c8-58de7782d7d0
* http://www.gbif.org/dataset/a779af82-1422-4b00-9e7f-8e1c1f07bea2

What is causing this? Is it possible to trigger a harvest these?]]>
    


Author: peterdesmet
Comment: The total amount of missing occurrences is 217,469.
Created: 2016-02-25 11:06:50.178
Updated: 2016-02-25 11:06:50.178


Author: mblissett
Comment: All eight of these datasets are in the stats file on POR-3045.  I'll look into this today.
Created: 2016-02-25 11:12:50.382
Updated: 2016-02-25 11:12:50.382


Author: mblissett
Comment: The first one I looked at, http://www.gbif.org/dataset/929d10ff-6f80-4ab3-a422-509c6721d402, doesn't have any core ids in the occurrence file, and doesn't have a collection code either.
Created: 2016-02-25 11:53:06.029
Updated: 2016-02-25 11:53:06.029


Author: mblissett
Created: 2016-02-25 12:14:45.515
Updated: 2016-02-25 12:16:46.188
        
All eight datasets have problems with identifiers.  Hopefully later this year we'll have a better way to report this:

Finished validating DwC-A for dataset [929d10ff-6f80-4ab3-a422-509c6721d402], ... Archive invalid because [100% invalid triplets is > than threshold of 25%; 119936 records without an occurrence id (should be 0)]

Finished validating DwC-A for dataset [5221e970-757c-43cb-bdd4-2f085bf36ae4], ... Archive invalid because [100% invalid triplets is > than threshold of 25%; 1965 records without an occurrence id (should be 0)]

Finished validating DwC-A for dataset [9c4e36c1-d3f9-49ce-8ec1-8c434fa9e6eb], ... Archive invalid because [100% invalid triplets is > than threshold of 25%; 5323 records without an occurrence id (should be 0)]

Finished validating DwC-A for dataset [9a0a4061-c7b5-43f2-abf2-bbd978a686ad], ... Archive invalid because [1300 duplicate triplets detected; 21499 records without an occurrence id (should be 0)]

Finished validating DwC-A for dataset [ce0f1750-ad92-46b7-b8c7-59033460de43], ... Archive invalid because [100% invalid triplets is > than threshold of 25%; 39190 records without an occurrence id (should be 0)]

Finished validating DwC-A for dataset [6ec2600f-768a-4240-9575-0972fef6c76f], ... Archive invalid because [100% invalid triplets is > than threshold of 25%; 2544 records without an occurrence id (should be 0)]

Finished validating DwC-A for dataset [cb9ee548-5c8b-415d-90c8-58de7782d7d0], ... Archive invalid because [100% invalid triplets is > than threshold of 25%; 985 records without an occurrence id (should be 0)]

Finished validating DwC-A for dataset [a779af82-1422-4b00-9e7f-8e1c1f07bea2], ... Archive invalid because [100% invalid triplets is > than threshold of 25%; 26027 records without an occurrence id (should be 0)]

    


Author: peterdesmet
Comment: Thanks, well forward this information to the maintainers of that IPT.
Created: 2016-02-25 12:26:46.705
Updated: 2016-02-25 12:26:46.705


Author: hoefft
Created: 2017-10-06 10:11:23.352
Updated: 2017-10-06 10:11:23.352
        
the data issue is moved to
https://github.com/gbif/portal-feedback/issues/540