Issue 14492

date last modified filter should be explained & improved + calendar crash

14492
Reporter: sant
Assignee: fmendez
Type: Improvement
Summary: date last modified filter should be explained & improved  +  calendar crash
Priority: Critical
Status: Open
Created: 2013-12-19 19:35:14.102
Updated: 2014-02-12 17:12:04.02
        
        
Description: When I try to search for recent changes in our datasets using that filter, either the whole dataset or nothing is returned.

When I get data returned, the reported "date last modifed" is a fake value.
Looks like the filter is based on the date last modified reported by the Tapirlink provider. So it is not filtering individual records, but individual datasets.

This should be clearly stated and explained to final user, which usually will be looking for individual records (not whole datasets).

For example, if I want to find new records of a certain species reported after january 2013, all records of that species in our dataset will be returned if our provider reports any modification made this year (even if it is completely unrelated to the taxon being searched).

Also, the page crashes about 20% of times when I try to select "modified after" date using the calendar (that is the reason this feedback form was open).  I am using Chrome version 31.0.1650.57 m
]]>
    


Author: kbraak@gbif.org
Comment: The issue with the modified date filter is being addressed in POR-1395
Created: 2013-12-20 11:53:17.896
Updated: 2013-12-20 11:55:12.888


Author: kbraak@gbif.org
Created: 2013-12-20 12:11:22.553
Updated: 2013-12-20 12:11:22.553
        
Thank you for reporting the issue. Some explanation on the date last modified filter is definitely needed.

In the meantime, please let me explain: we currently don't index http://rs.tdwg.org/dwc/terms/#dcterms:modified (or its equivalent in older versions of DwC or ABCD). The date last modified shown in the portal, is the date the record was last updated during indexing.

This is terribly confusing. Some help text needs to be added for the scientific name filter also, as described in POR-1388. We can address both of these changes at the same time. 
    


Author: sant
Created: 2013-12-20 19:03:45.099
Updated: 2013-12-20 19:42:35.802
        
Thanks Kyle

I understand a certain dataset will only be indexed if the "modified" date reported by provider metadata has changed after the previous indexation.

Can you explain the criteria used to decide WHICH records of a certain dataset will be updated during the indexing? (and so their "portal datelastmodified value" will be changed to current indexing date):

- All records in the dataset? (even if they have not been changed for years)

- Just records where dcterms:modified differs from the datelastmodified currently stored at gbif portal?

In other words: is datelastmodified value reported by portal identical for ALL records of a certain dataset?

Any chances that you start indexing (and let filter by) dcterms:modified, please?
Thanks

PS - I know not all providers take care of dcterms:modified, so these values could be wrong.
But that is a different problem. Does not happen in my case

    


Author: omeyn@gbif.org
Created: 2014-01-06 11:13:24.263
Updated: 2014-01-06 11:13:24.263
        
The portal last modified date will only change when the record is (re-)interpreted by us. That would happen if the record were updated in the original dataset and picked up in a recent crawl, but would also happen if we improved e.g. our lat/lng interpretation and reran on all the data. During a crawl, If no change is detected in an already-indexed record, no re-interpretation would be done, and hence the date last modified would not be updated. So, barring improvements in our interpretation, it would be possible for the records of a dataset to have different portal date last modifieds.

In the next couple of months we'll be rolling out a much wider indexing of dwc terms, which will include the dwc:modified. Whether or not it will be searchable is as yet unknown - there is a limit to how much we can make searchable (i.e. a "search filter") and we don't yet know what that limit is.