Issue 18251

Descriptions in species pages very inconsistent

18251
Reporter: thirsch
Type: Feedback
Summary: Descriptions in species pages very inconsistent
Resolution: Fixed
Status: Closed
Created: 2016-02-21 10:16:56.561
Updated: 2017-10-05 15:30:04.456
Resolved: 2017-10-05 15:29:30.171
        
        
Description: I am not sure whether there has been recent change here, but the text descriptions in the species/taxon pages now seem very inconsistent and a bit bizarre. Some, but by no means all, taxa use the German (only) version of the Wikipedia entry - such as this page. Many others, including common species/taxa, now have no description. As it stands this makes us look unprofessional.

*Reporter*: Tim Hirsch
*E-mail*: [mailto:thirsch]]]>
    


Author: mdoering@gbif.org
Created: 2016-02-22 10:30:36.921
Updated: 2016-02-22 10:30:36.921
        
Tim, is the issue that we seem to have lost english descriptions for the species?
I can see a few problems that come together with our current situation:

1) the current, old backbone only dynamically links to species information such as descriptions, vernacular names and distributions. That means as we index checklist data and match the found names to our backbone this dynamically linked information keeps changing and is never verified. You can see the currently linked list of all checklist records for Tapirus terrestris here: http://www.gbif.org/species/2440898/datasets?type=CHECKLIST If any of those "name usages" contains species descriptions we expose them in our backbone species page. I would like to change that and rather explicitly copy species infos at the time the backbone was built so its stable. This allows us also to apply some basic filtering, see POR-307 and POR-358

2) We seem to have missing matches for some records right now, e.g. the english wikipedia is not matches for some reason: http://www.gbif.org/species/113279983 I have filed a new issue for that: DM-287
    


Author: thirsch@gbif.org
Created: 2016-02-24 04:50:15.812
Updated: 2016-02-24 04:53:22.845
        
[~mdoering@gbif.org] the issue is that it looks very odd to have German Wikipedia descriptions show up on a site that is basically in English. It has been made all the more strange because the English Wikimedia descriptions are for whatever reason no longer appearing. Frankly it would be better to have no species description (unless and until we work out a sensible way of doing this) rather than the current situation.

Rather than jump to an arbitrary solution for this, I do think it should be brought into the overall content review for GBIF.org being led by [~kylecopas], as it is a fundamental editorial decision as to what text users can expect to find on a GBIF taxon page. 
    


Author: kylecopas
Created: 2016-02-24 07:45:11.881
Updated: 2016-02-24 07:45:11.881
        
I've seen German content pulled in since I arrived, so this doesn't strike me as new. Maybe it's just more widespread.

Not sure what I can offer on this, to be honest, as I really don't have a good grasp on what the pool of other syndicated-content options includes, at least not at the scale we need it. Was this scoped during the development of the previous site? 
    


Author: thirsch@gbif.org
Created: 2016-02-24 08:30:55.059
Updated: 2016-02-24 08:30:55.059
        
The difference is that previously the German content was usually accompanied by an English-language entry, which for whatever reason is now missing.

At some point we just need a strategic answer to the following: Does a user of GBIF searching for a species or other taxon expect some minimum encyclopedia-type description from an outside source (text plus images), or is that in itself is duplicating what e.g. EoL is doing and we should stick only to the core info such as taxonomy, synonyms, common names, occurrences, inclusion in checklists etc - with any remaining info available by linking out to external sources (already in place for BHL, EoL, CoL). I see this as part of the overall review of how GBIF.org serves its audiences, which is why I batted it back to [~kylecopas]. 
    


Author: rdmpage
Created: 2016-02-24 10:35:00.158
Updated: 2016-02-24 10:35:00.158
        
I think a text description can be very useful, partly for information, and partly as a reality check (if the text says an animal is African and GBIF shows it occurring in the US, then this should prompt the user to investigate further. Presumably given it's international audience GBIF should be thinking in terms of supporting descriptions in multiple languages. In practice these descriptions could be sourced from EOL and/or Wikipedia/Wikidata.

Personally I'm not a huge fan of simply linking to external sources. By all means do that, but bringing the information together in one place provides more information for the user, and opens up the possibly to combine that information in useful ways. Indeed, by simply linking we are missing an opportunity to generate a much more detailed and useful resource for biodiversity researchers.

I've not heard about the content review by [~kcopas@gbif.org], is this public?
    


Author: thirsch@gbif.org
Comment: Just to clarify I am not announcing anything new here - I was referring to the item included in the 2016 Work Programme which [~kylecopas] is leading, entitled '2.2 Strategy and Updates for GBIF.org' , including the following: "This data-driven, audience-focused approach will improve access, usefulness and discovery of targeted, relevant content on GBIF.org for the audiences identified in the GBIF Communications Strategy." While this is focussed on the 'communications' part of GBIF.org rather than the organization of data, my point was that things like the content on species pages can be seen as part of this strategy to best serve our audiences. Rod's point about the usefulness of having at least some species description text, in addition to external links, is well taken.
Created: 2016-02-24 14:36:45.13
Updated: 2016-02-24 14:36:45.13


Author: mdoering@gbif.org
Created: 2016-02-24 15:08:37.808
Updated: 2016-02-24 15:10:19.368
        
As you probably know we index various kind of "extension" data for species information and so far also display them all in our portal:
http://rs.gbif.org/extension/gbif/1.0/
I would very welcome a discussion around these data types and the use and visualization of them in the GBIF portal.

The extensions that cause most irritation in the secretariat seem to be the description and distribution extensions:
http://rs.gbif.org/extension/gbif/1.0/distribution.xml
http://rs.gbif.org/extension/gbif/1.0/description.xml

The idea of the description extension is to publish unparsed, human readable text paragraphs in the wider sense, not species descriptions sensu strictu. There are publishers using it for species distributions which could not be parsed into distinct distribution records. Same for type information. I think this is very useful and allows to publish lots of information that does not fit into the more precisely defined formats.
English is not required as the language, but we could of course enforce it. I think it is far more useful to not restrict languages though. Especially for the original species treatments as they were first published (re-published through Plazi for example). In those days German, Latin, French and even Danish were more common than English: http://zookeys.pensoft.net//lib/ajax_srv/article_elements_srv.php?action=zoom_figure&instance_id=12&article_id=6242

For the distribution extension see http://dev.gbif.org/issues/browse/PF-2357 for a recent use case

    


Author: thirsch@gbif.org
Comment: [~mdoering@gbif.org] I think my point is much simpler and more basic common sense on the question of language. I fully agree the information from these extensions provides useful context and there may be occasions when non-English text could be appropriate if e.g. we want to include original species treatments. The common sense part, though, is that if we are mainly including Wikipedia text for the prominent human-readable part of that description, it looks odd and arbitrary to have content in German. Either harvest multiple language versions, or just English.
Created: 2016-02-24 15:38:00.875
Updated: 2016-02-24 15:38:00.875


Author: mdoering@gbif.org
Created: 2016-02-24 15:49:11.98
Updated: 2016-02-24 15:50:53.501
        
Like I said in the beginning of this thread, not having the english wikipedia descriptions showing up here when there actually does exist one is a bug:
http://www.gbif.org/species/113279983
http://www.gbif.org/species/100373490

I am sure there must be rare cases though when there is only a German wikipedia entry and not an english one. That species page would then just show German text. Also the different wikipedias (we index english, german, spanish & french) are the dominant source of descriptions, especially for common taxa, but they are not the only one.
    


Author: mdoering@gbif.org
Comment: Would there be a better way to present such data if its available in any language and we cannot guarantee an english version?
Created: 2016-02-24 15:49:58.897
Updated: 2016-02-24 15:49:58.897


Author: rdmpage
Created: 2016-02-24 16:45:11.047
Updated: 2016-02-24 16:45:11.047
        
At some point it would be useful to be able to offer information in multiple languages, intelligently guess the language the user expects, and if information is not available in that language but is in another, give the user the option of seeing that (and/or offer automatic translation as does Facebook and Airbnb).

This will come up in other circumstances, such as:
* choosing which vernacular name to display by default
* which language to display a bibliographic reference in. For example, there is a lot of Chinese, Japanese, Spanish, and Portuguese taxonomic literature that I'm working with where I have both English and non-English titles and it would be elegant, say, to display the Chinese text if we knew that the user's preferred language was Chinese.

    


Author: hoefft
Created: 2016-02-24 16:57:48.739
Updated: 2016-02-24 16:57:48.739
        
It is only fair to assume the educated reader speaks the main languages - danish included in that list (thank you Marcus).

I agree with Tim that it looks odd, there should probably be an explanation to frame it as what it is - a service and not an error. That said - those of us that are not native english speakers see this fairly often the other way around.

And if the ambition is to have the site in multiple languages this will be even more widespread. Articles and interface in spanish and species/dataset/etc information in english. Some accompanying text telling you that this is not a mistake but the best we can do with the information at hand is probably a good idea. How to present that in a nice way is still unclear.

As for including a full Wikipedia article, I'm not convinced it is the best way. I would think it to be more useful with a (possibly long) snippet and a link. At least scrolling within that 10cm2 container is annoying, I'd rather read the article full page. And then wikipedia seems the more natural option. Who knows, they might even provide me with the danish version.

[~kylecopas] A way to find out would be to add google analytics tracking and aggregate data about species page usage. Do our users actually click around in the TOC?
    


Author: kylecopas
Created: 2016-02-24 17:02:10.611
Updated: 2016-02-24 17:02:10.611
        
'species page usage'

I think all of this discussion is pointing back at what we don't know about this, Morten. I had an off-JIRA discussion with Rod about some of this, but it was more about what he would like to see and finds useful than, for instance, how others view them, at what rate, and to what end.

GA will provide some good data to start, but I also feel a desperate need for some usage-based on-site surveys or ad hoc focus groups, in that order.  
    


Author: rdmpage
Created: 2016-02-24 17:39:30.455
Updated: 2016-02-24 17:39:30.455
        
Re species descriptions, would be worthwhile looking at EOL, both for text and for structured data. They serve JSON-LD for some attributes (e.g., body weight) and these could be presented in multiple languages. On the flip side, judging by https://gitter.im/EOL/eol EOL seems to be having some serious issues with their existing infrastructure so exactly how reliable a source they will be seems unclear. But the idea of using structured data and generating natural language text from that is worth thinking about (see also Wikidata).

Re user testing, there's a nice app for Mac called SilverBack which is useful for recording sessions (and user's facial expressions) https://silverbackapp.com The latest version (3) is broken, but you can grab version 2 from the web site. Would need willing subjects, maybe Copenhagen students bribed with beer? ;)
    


Author: kylecopas
Created: 2016-02-26 11:10:22.972
Updated: 2016-02-26 11:10:22.972
        
I know SilverBack by reputation—thanks for the reminder.

W.r.t. subjects, I'd much prefer actual current users to willing beer-drinking bodies (though some overlap is acceptable). I have some ideas about recruitment through the site. 
    


Author: hoefft
Created: 2017-10-05 15:28:48.024
Updated: 2017-10-05 15:30:04.453
        
analytics showed that our users did not use the description widget. It was decided to remove wikipedia descriptions from the site.
Descriptions still appear in multiple languages based on what is published