Issue 18704

Several GBIF resources have dead links

18704
Reporter: mblissett
Assignee: kylecopas
Type: Feedback
Summary: Several GBIF resources have dead links
Priority: Major
Resolution: Fixed
Status: Resolved
Created: 2016-08-24 10:41:10.858
Updated: 2016-09-06 14:24:52.956
Resolved: 2016-09-06 14:24:52.848
        
Description: Here are resources that return a 404 Not Found.

http://www.gbif.org/resource/81299 http://www.thecarbonproject.com/gaia.php HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81413 http://dps.plants.ox.ac.uk/bol/brahms/ HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81443 http://insects.oeb.harvard.edu/mantis/ HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81709 http://www.natural-solutions.eu/solutions.aspx HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81710 http://www.natural-solutions.eu/solutions.aspx HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81726 http://www.thomsonecologysoftware.com/trex-2 HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81734 http://tools.sibcolombia.net/excel-a-dwca/ HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81750 http://tools.sibcolombia.net/taxon/index.php/taxon/busqueda?lang=es HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81752 http://tools.sibcolombia.net/taxon/index.php/taxon/busqueda?lang=en HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81763 http://www.soc.napier.ac.uk/~cs22/vesperDemo/vesper/demoNew.html HTTP/1.1 404 Not Found
http://www.gbif.org/resource/80582 http://www.gbif.org/orc/?doc_id=2724 HTTP/1.1 404 Not Found
http://www.gbif.org/resource/80873 http://imsgbif.gbif.org/CMS_ORC/?doc_id=5530&download=1 HTTP/1.1 404 Not Found
http://www.gbif.org/resource/80973 http://elearning.gbif.es/AContent/home/course/content.php?_cid=77 HTTP/1.1 404 Not Found
http://www.gbif.org/resource/80982 http://ec.europa.eu/europeaid/infopoint/publications/europeaid/49a_en.htm HTTP/1.1 404 Not Found
http://www.gbif.org/resource/80983 http://ec.europa.eu/europeaid/infopoint/publications/europeaid/49a_en.htm HTTP/1.1 404 Not Found
http://www.gbif.org/resource/80984 http://www.ala.org.au/tools-services/onlinedesktop-tools-review/ HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81005 http://www.iaia.org/publicdocuments/special-publications/sp7_web.pdf HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81176 http://manisnet.org/CoordCalcManual.html HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81190 http://share.biodiversity.aq/Promotional%20material/Flyers/Quick_Guide_Publishing%20_data.pdf HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81213 http://elearning.gbif.es/AContent/home/course/content.php?_cid=194 HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81214 http://elearning.gbif.es/AContent/home/course/content.php?_cid=167 HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81218 http://www.lucsus.lu.se/FranklinEurope_Sept08.pdf HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81231 http://data.iucn.org/dbtw-wpd/edocs/2007-059.pdf HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81232 http://data.iucn.org/dbtw-wpd/edocs/CD-029-Es.pdf HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81233 http://data.iucn.org/dbtw-wpd/edocs/CD-029-Fr.pdf HTTP/1.1 404 Not Found
http://www.gbif.org/resource/81274 http://tools.gbif.org/namefinder/ HTTP/1.1 404 Not Found

And here are more, which don't have an active web server at all:

http://www.gbif.org/resource/81236 http://enbi.utu.fi/Documents/lcc9_observations_on_obsdata%2B%2B.pdf
http://www.gbif.org/resource/81239 https://wiki.biovel.eu/display/doc/Training+manual+-+Ecological+Niche+Modelling+and+related+workflows
http://www.gbif.org/resource/80578 http://www2.gbif.org/corporate_EN.pdf
http://www.gbif.org/resource/81027 http://www.animalbase.org/
http://www.gbif.org/resource/81385 http://www.eti.uva.nl/products/linnaeus.php
http://www.gbif.org/resource/81000 http://www-old.gbif.org/participation/participant-nodes/cepdec/sep-cepdec/
http://www.gbif.org/resource/80994 http://www-old.gbif.org/participation/training/networks/
http://www.gbif.org/resource/81163 http://www.redeimpactos.org/upload/IAIA_GBIF_iaia.pdf

There are others that have broken links, but they're harder to catch automatically (e.g. http://www.gbif.org/resource/81275 http://bionomenclature-glossary.gbif.org/ since that gives a redirect to the GBIF homepage; not possible to distinguish that type of redirect from one sending the user to an updated location for a page).]]>
    


Author: kylecopas
Created: 2016-08-24 10:54:57.996
Updated: 2016-08-24 10:54:57.996
        
[sigh]

Could we just remove all the resources and curate a clean set from scratch?
    


Author: mblissett
Created: 2016-08-24 11:11:15.516
Updated: 2016-08-24 11:12:25.131
        
Of course — I did this without checking if it would be useful. (Though deleting everything would break any links people have to them.)

Mostly to see if it could be quickly scripted:

{code}
    for n in `seq 15`;
        for i u in $(curl -Ss 'http://cms.gbif-dev.org/api/v1/resource/?page='$n | jq '.data[] | select(.resourceUrl != null) | [.id, .resourceUrl] | @tsv' | tr -d '"' | sed s/\\\\t/$'\t'/); do
            echo $i $u $(curl -I -Ss $u | head -n 1);
        done;
{code}
    


Author: kylecopas
Comment: sorry, that was a joke! I hate how many marginal 'resources' we have that just add to the festering scent of link rot...
Created: 2016-08-24 11:20:05.45
Updated: 2016-08-24 11:20:05.45


Author: kylecopas
Comment: I suspect it will be best to unpublish most of these, but a qualitative curatorial assessment is probably in order for them all.
Created: 2016-08-24 11:21:21.803
Updated: 2016-08-24 11:21:21.803


Author: dnoesgaard" rolelevel="10000
Created: 2016-09-02 16:07:11.881
Updated: 2016-09-02 16:07:11.881
        
Working on this and tracking progress here:
https://docs.google.com/spreadsheets/d/1zLH_QU7N-l2CgLKYdrPUcWNSneF6FWsDseoVY6nakmc/edit?usp=sharing
    


Author: dnoesgaard
Comment: Is there an easy way of bulk unpublishing? I'm down to 23 resources with less than 50 pageviews since early 2015... 
Created: 2016-09-05 16:10:28.787
Updated: 2016-09-05 16:10:28.787


Author: mblissett
Comment: [~bko@gbif.org] might know a way.
Created: 2016-09-05 16:48:03.136
Updated: 2016-09-05 16:48:03.136


Author: bko@gbif.org
Created: 2016-09-05 17:16:25.237
Updated: 2016-09-05 17:16:25.237
        
Responding to mentioning...

Yes you can, by using the content list page (admin/content), choose those you want to unpublish, and execute a bulk action (unpublish).

Or you can just let me know the node id of those resources, then to me it's one line of SQL statement.

Let me know.
    


Author: dnoesgaard
Created: 2016-09-06 10:08:10.589
Updated: 2016-09-06 10:08:10.589
        
Burke,

Could you please do your SQL magic and unpublish on these resources?

81750
81236
80984
81176
81710
81709
80582
81218
80873
80578
81000
80994

For the record, they all have <10 pageviews since early 2015...
    


Author: bko@gbif.org
Comment: I have updated the via SQL statements so these are all unpublished. Only realised that moderation enabled content types don't have a button to un-publish a node like without it. So if more help needed, let me know.
Created: 2016-09-06 14:14:58.929
Updated: 2016-09-06 14:14:58.929


Author: dnoesgaard
Comment: Thanks Burke. If they are unpublished, I'm happy :)
Created: 2016-09-06 14:23:24.835
Updated: 2016-09-06 14:23:24.835