Issue 18266

56 DWCA datasets have zero occurrences

18266
Reporter: mblissett
Type: Task
Summary: 56 DWCA datasets have zero occurrences
Priority: Unassessed
Status: Open
Created: 2016-02-25 15:19:41.977
Updated: 2016-02-25 15:19:41.977
        
Description: 56 occurrence datasets, as found in debugging POR-3045, have zero occurrences.

These ones fail because of problems with identifiers:

{code}
0041409c-519a-4738-9a25-a38e641e16ef 100% invalid triplets is > than threshold of 25%; 2011 records without an occurrence id (should be 0)
01479f71-9e66-4aa9-9481-dabeb5aceaf4 100% invalid triplets is > than threshold of 25%; 471 records without an occurrence id (should be 0)
04838789-732b-4cbf-8370-0187e49b5d9c 100% invalid triplets is > than threshold of 25%; 20540 records without an occurrence id (should be 0)
0ae1b0fb-28fe-4ce4-83b6-380336db6ad9 100% invalid triplets is > than threshold of 25%; 15 records without an occurrence id (should be 0)
103fa8e2-5be7-402c-b147-708d0755d9b0 15932 duplicate triplets detected; 139994 records without an occurrence id (should be 0)
18680b2b-84db-4d45-8acd-0b0e9467fb33 29% invalid triplets is > than threshold of 25%; 7637 records without an occurrence id (should be 0)
1c714c57-da3e-457e-9752-edd1cd5e0463 100% invalid triplets is > than threshold of 25%; 173681 records without an occurrence id (should be 0)
29fbd154-5eab-4066-8dbc-f7b00567ac80 1 duplicate triplets detected; 607526 records without an occurrence id (should be 0)
390d8ae9-6ada-4485-8d32-39784f66fe32 100% invalid triplets is > than threshold of 25%; 17 records without an occurrence id (should be 0)
3c436b0f-fb55-4e16-9c51-2966cae36e65 27 duplicate triplets detected; 3979 records without an occurrence id (should be 0)
3c6ecec0-7c9c-4ce1-a3b8-5ff3eafc14a1 6 duplicate triplets detected; 529 records without an occurrence id (should be 0)
3cba6a56-256f-4d38-ae9b-b201274fe2c6 100% invalid triplets is > than threshold of 25%; 413 records without an occurrence id (should be 0)
42aceab6-e193-4df1-87a1-81eeca0871af 65 duplicate triplets detected; 3756 records without an occurrence id (should be 0)
457f9aab-5dce-4dd9-9a71-2d123f80d2f6 100% invalid triplets is > than threshold of 25%; 697 records without an occurrence id (should be 0)
4fd5035e-d565-4f43-9385-08ca572fa6e5 2 duplicate triplets detected; 838 records without an occurrence id (should be 0)
5221e970-757c-43cb-bdd4-2f085bf36ae4 100% invalid triplets is > than threshold of 25%; 1965 records without an occurrence id (should be 0)
5ff795ba-b051-4c89-8489-43e053bef0d0 100% invalid triplets is > than threshold of 25%; 785 records without an occurrence id (should be 0)
65e34334-66b3-4536-ac3c-857893fe0f6a 100% invalid triplets is > than threshold of 25%; 57977 records without an occurrence id (should be 0)
6ec2600f-768a-4240-9575-0972fef6c76f 100% invalid triplets is > than threshold of 25%; 2544 records without an occurrence id (should be 0)
73ac859f-a43c-488b-9d9f-92b838b61730 100% invalid triplets is > than threshold of 25%; 539 records without an occurrence id (should be 0)
863eed07-c5d8-40ae-baee-23f8d3fa475a 3 duplicate triplets detected; 22648 records without an occurrence id (should be 0)
89df1c06-1f0f-432a-bae7-c495837317c6 100% invalid triplets is > than threshold of 25%; 690 records without an occurrence id (should be 0)
8b34c1da-d502-4705-9fb9-f6ad4e2aea4d 103 duplicate triplets detected; 942 records without an occurrence id (should be 0)
929d10ff-6f80-4ab3-a422-509c6721d402 100% invalid triplets is > than threshold of 25%; 119936 records without an occurrence id (should be 0)
9324c04c-7ac6-4b69-b98c-4472c51c73c5 100% invalid triplets is > than threshold of 25%; 928 records without an occurrence id (should be 0)
93efdc85-10ca-448c-a3e3-01bce24479c9 103 duplicate triplets detected; 12101 records without an occurrence id (should be 0)
99f7df58-b668-4c3a-a7ef-dd30b698084c 100% invalid triplets is > than threshold of 25%; 155 records without an occurrence id (should be 0)
9a0a4061-c7b5-43f2-abf2-bbd978a686ad 1300 duplicate triplets detected; 21499 records without an occurrence id (should be 0)
9bf16572-a62e-4a87-80b5-4847ca7ff31f 100% invalid triplets is > than threshold of 25%; 11668 records without an occurrence id (should be 0)
9c4e36c1-d3f9-49ce-8ec1-8c434fa9e6eb 100% invalid triplets is > than threshold of 25%; 5323 records without an occurrence id (should be 0)
a2e308bf-e9ec-4651-906e-956c963df0ca 100% invalid triplets is > than threshold of 25%; 5420 records without an occurrence id (should be 0)
a387baf5-bd53-4c75-b391-326e892ea0c8 100% invalid triplets is > than threshold of 25%; 30 records without an occurrence id (should be 0)
a779af82-1422-4b00-9e7f-8e1c1f07bea2 100% invalid triplets is > than threshold of 25%; 26027 records without an occurrence id (should be 0)
b9718c0d-08ff-437b-a159-909b50506c97 26 duplicate triplets detected; 52 records without an occurrence id (should be 0)
c02fff39-7869-42c5-93ac-8ef3ca59a7e1 1 duplicate triplets detected; 269 records without an occurrence id (should be 0)
cb9ee548-5c8b-415d-90c8-58de7782d7d0 100% invalid triplets is > than threshold of 25%; 985 records without an occurrence id (should be 0)
ce0f1750-ad92-46b7-b8c7-59033460de43 100% invalid triplets is > than threshold of 25%; 39190 records without an occurrence id (should be 0)
d0121865-5b34-43c5-8fdd-fec96672cf4d 259 duplicate triplets detected; 569 records without an occurrence id (should be 0)
d20dfc7c-3800-4afd-8b82-44c0b5a62d91 100% invalid triplets is > than threshold of 25%; 149 records without an occurrence id (should be 0)
d3a00072-72b7-4894-96c4-ecd008892eb4 100% invalid triplets is > than threshold of 25%; 377 records without an occurrence id (should be 0)
d82168d0-b006-4cc1-9b70-1d9c2001da74 100% invalid triplets is > than threshold of 25%; 1 records without an occurrence id (should be 0)
d8774f09-ebd8-451c-b133-41e4500981d9 100% invalid triplets is > than threshold of 25%; 117 records without an occurrence id (should be 0)
d8b06df0-81b3-41c9-bcf8-6ba5242e2b95 100% invalid triplets is > than threshold of 25%; 6164 records without an occurrence id (should be 0)
d912f677-e998-4beb-a61f-b68406c2b66b 2157 duplicate triplets detected; 23896 records without an occurrence id (should be 0)
db504a46-955f-45a6-bdc9-d9a7ebc85668 100% invalid triplets is > than threshold of 25%; 89071 records without an occurrence id (should be 0)
db9fd50a-169a-448c-85d3-863b0eb705f2 1083 duplicate triplets detected; 2166 records without an occurrence id (should be 0)
dd92c709-8f7d-4bf3-9897-901aa88486e5 100% invalid triplets is > than threshold of 25%; 861 records without an occurrence id (should be 0)
e5c679a3-cb99-42f2-bd45-6b68a6ff96e0 1 duplicate triplets detected; 17202 records without an occurrence id (should be 0)
e9a36843-9a63-46fe-8065-e80be0f86f49 6 duplicate triplets detected; 3739 records without an occurrence id (should be 0)
faaaca08-81cc-4c6b-9505-581470dae732 100% invalid triplets is > than threshold of 25%; 1106 records without an occurrence id (should be 0)
fab6edb3-9311-4219-9d32-6114125f86a1 2 duplicate triplets detected; 3007 records without an occurrence id (should be 0)
fb1840bb-6b48-4ee9-81b1-c4f135108d29 100% invalid triplets is > than threshold of 25%; 23 records without an occurrence id (should be 0)
fd6c18af-ea6c-4b85-90ca-26dbe0cadec2 13 duplicate triplets detected; 3220 records without an occurrence id (should be 0)
feb769dd-6096-4b25-8a69-9ece3a9e4b66 31% invalid triplets is > than threshold of 25%; 3019 records without an occurrence id (should be 0)
{code}

5f9180cc-64f6-4c6f-87b1-893daa3af6a3 also fails due to identifiers, although it's a large dataset, 4.5 million records, so the testing is only partial.  It fails because there are 4372 duplicated triples (of which 80 appear three times).

http://www.gbif.org/dataset/e70580b8-b1df-4566-9a33-b32f30aab526 's link doesn't work any more, and there's something odd with the installation.  It doesn't show up on the publisher's page. They have another installation with several datasets.
]]>