13787
Reporter: mdoering
Assignee: mdoering
Type: Improvement
Summary: Make Contact address field a list
Description: To support good rendering of long addresses we need the individual lines separately, see http://dev.gbif.org/issues/browse/PF-924
Priority: Major
Resolution: Fixed
Status: Resolved
Created: 2013-09-06 14:37:25.899
Updated: 2014-07-07 11:04:03.886
Resolved: 2014-07-07 11:04:03.863
Author: mdoering@gbif.org
Created: 2014-05-01 11:41:22.75
Updated: 2014-05-01 11:41:22.75
Contacts are not only coming from the IMS but mostly from EML.
In EML a "party" can have multiple address fields, emails, phone numbers or home pages.
Also the organisation or position name can exist in multiple languages, see attached XSD screenshot from this schema:
http://rs.gbif.org/schema/eml-2.1.1/eml-party.xsd
Author: kbraak@gbif.org
Created: 2014-06-26 17:17:58.104
Updated: 2014-06-26 17:18:42.694
If we want to handle parsing pure EML (versus parsing GBIF Metadata Profile based primarily on EML), simply changing Contact.address (which only represents deliveryPoint by the way) from a String to a List won't be enough.
What we will need to do is change Contact so that it has a list of Address objects, whereby each Address has a list of deliveryPoint, a city, a country, etc. That is because:
* EML 2.1.1 "party" can have multiple addresses
* EML 2.1.1 "address" can have multiple deliveryPoints
This represents a much more significant change, but is necessary if we want to parse all contacts from 100% EML published by KNB for example.
Otherwise, our API (e.g. Contact) is only adequate for parsing the GBIF Metadata Profile since:
* GBIF Metadata Profile 1.0.2 "party" can only have 1 address
* GBIF Metadata Profile 1.0.2 "address" can only have 1 deliveryPoint
I'm sure the deeper we look, we'll find a lot of other discrepancies. A decision therefore is needed, whether we try to handle parsing pure EML, or only handle parsing the GBIF Metadata profile. If we are committed to harvesting from KNB later this year, we probably have to. What do you guys think [~trobertson@gbif.org], [~omeyn@gbif.org], [~fmendez@gbif.org], [[~mdoering@gbif.org] ?
***
For reference:
EML party schema: http://rs.gbif.org/schema/eml-2.1.1/eml-party.xsd
GBIF Metadata profile schema: http://rs.gbif.org/schema/eml-gbif-profile/1.0.2/eml-gbif-profile.xsd
Author: mdoering@gbif.org
Comment: As we get EML not only from KNB later the year but also already via dwc archives that adher to the full EML I think we should at least be able to extract the absolute vital information correctly. To me this is the title, description and contact information as number one. See also POR-523 and POR-493 for this
Created: 2014-06-26 21:57:08.795
Updated: 2014-06-26 21:57:08.795
Author: mdoering@gbif.org
Created: 2014-07-01 12:15:02.427
Updated: 2014-07-01 12:15:02.427
I would say we have two options:
1) Implement it exactly like EML in which case we need to have a new PostalAddress class (and postgres table) that in turn has a deliveryPoint/address List (plus city,zip,country,etc). In addition make most contact properties a list (email, phone, url, position, organization). first & lastName probably needs to become a new class too so we can make the complex class repeatable in Contact.
2) Keep a single contact class, but allow lists for many non complex properties incl deliveryPoint/address, email, phone, url, position
I would suggest to go with number 2 as this has far less impact on our current code, is straight forward to be used and should capture most information in EML. Having multiple postal addresses, e.g. in different cities, might be useful but a rather rare and extreme case.
Author: mdoering@gbif.org
Created: 2014-07-03 12:54:42.908
Updated: 2014-07-03 12:54:42.908
Going for #2 with list properties in Address & Contact:
https://github.com/gbif/gbif-api/commit/4df7fdb019b962ebfd6713d79e49bf9ab051be10
https://github.com/gbif/registry/commit/71ef5dd4f47987f1d906bfe464f8a492dbf4e7cd
As the portal template treats nodes and organizations as if they had implemented our Address interface we also applied the list changes to Node, Organization and Network:
https://github.com/gbif/gbif-api/commit/11edb27b5dea615b1dbcded77263bb776c690186
https://github.com/gbif/registry/commit/dba762cdee6ea178f842825906004a66d0af0949
https://github.com/gbif/drupal-mybatis/commit/746a871c03c9c247ab6f995d11071d631abb0764
Author: kbraak@gbif.org
Comment: Reopening: there's 2 failing ITs in the Registry: http://builds.gbif.org/job/registry/181/console
Created: 2014-07-04 17:59:18.653
Updated: 2014-07-04 17:59:18.653
Author: mdoering@gbif.org
Created: 2014-07-07 10:37:49.652
Updated: 2014-07-07 11:03:59.944
Funny enough these 2 tests do not fail on my machine. Its a character encoding issue with unmlauts that always existed but never was detected before. Fixed in https://github.com/gbif/registry/commit/5079354c7fdd034713f39904afa22a0a6054ed48