Issue 16531

Occurrence filter: records with "Associated Media: Media Link" do not get caught in a filter for "MEDIA_TYPE=*"

16531
Reporter: ahahn
Type: Bug
Summary: Occurrence filter: records with "Associated Media: Media Link" do not get caught in a filter for "MEDIA_TYPE=*"
Priority: Major
Status: Open
Created: 2014-10-14 12:33:34.276
Updated: 2016-02-15 13:45:38.409
        
Description: Dataset "DORSA - German Orthoptera Collections" (8ea44a78-c6af-11e2-9b88-00145eb45e9a) contains numerous multimedia links, e.g. for record http://www.gbif.org/occurrence/1024623968. The verbatim version of the record shows that they are marked as, e.g., format "tiff" (there are three other formats, also including sound files as "x-wav"): http://www.gbif.org/occurrence/1024623968/verbatim. As this apparently is not interpreted during indexing, the records do show Associated Media on the individual occurrence page, but a search filter for "multimedia types: all (http://www.gbif.org/occurrence/search?DATASET_KEY=8ea44a78-c6af-11e2-9b88-00145eb45e9a&MEDIA_TYPE=*) does not return any records.

Expected behaviour: a search for "media-type: all" should return any record that has a multimedia link, regardless of a) the existence or b) the content of a file type marker. A search for "media-type: image" would not be able to return such a record.]]>
    


Author: ahahn@gbif.org
Created: 2014-11-17 15:53:33.399
Updated: 2014-11-17 15:53:33.399
        
In addition, we will want to interpret the file types to make them filterable as image, video or sound

Example records with multimedia content:
Image example:
http://www.gbif.org/occurrence/search?DATASET_KEY=8ea44a78-c6af-11e2-9b88-00145eb45e9a&CATALOG_NUMBER=72508
Sound examples:
http://www.gbif.org/occurrence/search?DATASET_KEY=8ea44a78-c6af-11e2-9b88-00145eb45e9a&CATALOG_NUMBER=297257
http://www.gbif.org/occurrence/search?DATASET_KEY=8ea44a78-c6af-11e2-9b88-00145eb45e9a&CATALOG_NUMBER=76805
    


Author: ahahn@gbif.org
Created: 2014-11-18 09:38:10.47
Updated: 2014-11-18 09:38:10.47
        
New examples after reindexing:
Sound:
http://www.gbif.org/occurrence/1024631656
http://www.gbif.org/occurrence/788892194
Image:
http://www.gbif.org/occurrence/1024631663
http://www.gbif.org/occurrence/1024631660
    


Author: fmendez@gbif.org
Comment: [~omeyn@gbif.org] seems that the interpretation of those examples can be extended, an option is to send a HEAD request for un-interpreted links and get the media type from there, for example: http://www.gbif.org/occurrence/1024623968 contains the associated media link http://www.biologie.uni-ulm.de/cgi-bin/perl/bild.pl?sid=T&mode=thumb&objid=61720 which is  JPEG image, in this case the image type can be obtained from the response.
Created: 2015-03-04 14:14:29.039
Updated: 2015-03-04 14:14:29.039