Issue 18631

Wrong indexing of Occurrence field "Issues" in Solr index

18631
Reporter: cgendreau
Type: Bug
Summary: Wrong indexing of Occurrence field "Issues" in Solr index
Priority: Unassessed
Status: Open
Created: 2016-07-06 11:39:01.946
Updated: 2016-08-11 16:25:38.764
        
Description: The Occurrence Solr index in production contains 2 type of values for "Issue:
-name="COORDINATE_ROUNDED"
-name="org.gbif.api.vocabulary.OccurrenceIssue:COORDINATE_ROUNDED"

The second one containing the full name is wrong and seems to come from the [SolrOccurrenceWriter|https://github.com/gbif/occurrence/blob/master/occurrence-search/src/main/java/org/gbif/occurrence/search/writer/SolrOccurrenceWriter.java] class.

This hypothesis is based on the fact that this class is doing
doc.setField(ISSUE.getFieldName(), occurrence.getIssues());

while for other enums we index the result of .name() of the enums.]]>
    


Author: cgendreau
Comment: https://github.com/gbif/occurrence/commit/631145b1a16527b6b1778d7f94d1104ab6ea1c7c
Created: 2016-07-07 16:45:03.191
Updated: 2016-07-07 16:45:03.191


Author: cgendreau
Created: 2016-07-08 10:11:04.533
Updated: 2016-07-08 12:23:37.685
        
Information:

It seems Solr will handle queries differently when we use the EmbeddedServer vs Solr Cloud.

I can confirm the EmbeddedServer transferred data in XML and .toString() is called on the enum so we end up with the correct value. See org.apache.solr.client.solrj.request.UpdateRequest.

In CloudSolrClient, it seems the same method will eventually be used but it still requires to be debugged. The behavior is close to binary transfer using a [DocumentObjectBinder|https://lucene.apache.org/solr/5_0_0/solr-solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.html] but I have no proof.
    


Author: fmendez@gbif.org
Comment: The fix applied in https://github.com/gbif/occurrence/commit/631145b1a16527b6b1778d7f94d1104ab6ea1c7c seems to be ok, we should remove the toUpperCase in https://github.com/gbif/occurrence/commit/631145b1a16527b6b1778d7f94d1104ab6ea1c7c#diff-0eed5a1b5d91dada53af34ea13bda30eR196 if we are not doing the same in the batch indexer
Created: 2016-08-11 16:25:38.764
Updated: 2016-08-11 16:25:38.764