Skip to content
This repository has been archived by the owner on Dec 2, 2021. It is now read-only.
Christina Harlow edited this page Aug 14, 2018 · 17 revisions

RIALTO ETL Data Scratch Space

Named Graphs

Create one named graph per data source.

Namespaces & Schemas

Organizations (CAP) mapping

  • Organization Identifier == $.alias (string)
  • RDF.type == FOAF.Agent, FOAF.Organization
  • Organization URI == RIALTO organizations namespace + organization identifier
  • Organization Alias == $.alias (string)
  • Children == $.children (array of strings, identifiers for each child), mapped to OBO.BFO_0000051 for each child identifier as a child organization URI
  • Organization Name == $.name (string), mapped to SKOS.prefLabel & RDFS.label as a Literal
  • Organization Codes == $.orgCodes (array of strings), mapped to DCTERMS.identifier as a Literal
  • Parent == $.parent (string, identifier for parent), mapped to OBO.BFO_0000050 for parent identifier as a parent organization URI
  • Organization Types == $.type
  • Based on $.type
    • "DEPARTMENT": RDF.type, VIVO.Department
    • "DIVISION": RDF.type, VIVO.Division
    • "ROOT": RDF.type, VIVO.University (Always Stanford University)
    • "SCHOOL": VIVO.School
    • "SUB_DIVISION": VIVO.Division

People (Profiles) Mapping

  • Person Identifier == $.profileId (string)

  • RDF.type == FOAF.Agent, FOAF.Person

  • Person URI == RIALTO people namespace + person identifier

  • Person Label == $.names.preferred.firstName (string) + " " + $.names.preferred.middleName (string) + " " + $.names.preferred.lastName (string), mapped to SKOS.prefLabel & VCARD.fn as a Literal

  • Person Name URI == RIALTO names namespace (in contexts) + person identifier

  • Person Name

    • Person URI VCARD.hasName Person Name URI .
    • Person URI RDF.type, VCARD.Name .
    • Person Name URI VCARD.given-name $.names.preferred.firstName (string) .
    • Person Name URI VCARD.middle-name $.names.preferred.middleName (string) .
    • Person Name URI VCARD.family-name $.names.preferred.lastName (string).
  • Person Affiliation:

    • if $.affiliations.capPhdStudent (Boolean) == True or $.affiliations.capMsStudent (Boolean) == True or $.affiliations.capMdStudent (Boolean) == True: Person URI RDF.type VIVO.Student
    • if $.affiliations.capFaculty (Boolean) == True: Person URI RDF.type VIVO.FacultyMember
    • if $.affiliations.capFellow (Boolean) == True or $.affiliations.capResident (Boolean) == True or $.affiliations.capPostdoc (Boolean) == True: Person URI RDF.type VIVO.NonFacultyAcademic
    • if $.affiliations.physician (Boolean) == True or $.affiliations.capStaff (Boolean) == True: Person URI RDF.type VIVO.NonAcademic
    • Ignoring $.affiliations.capRegistry & $.affiliations.capOther at present
  • Person Biograph: Person URI VIVO.overview $.bio.text (Literal)

  • Person address: if $.contacts.type == "academic":

  • Person Address URI: RIALTO Address NS (contexts) + $.contacts.address (Literal) + $.contacts.zip (Literal) (encode or replace spaces or other bad characters)

    • Person URI VCARD.hasName Person Name URI .

    • Person URI RDF.type, VCARD.Name .

    • $.contacts.address (Literal)

    • $.contacts.city (Literal)

    • $.contacts.state (Literal)

    • $.contacts.zip (Literal)

                  if postal_code.startswith("9430"):
                      country_uri = "http://sws.geonames.org/6252001/"
                      graph.add( (address_uri, DCTERMS.spatial, URIRef(country_uri)) )
                      graph.add( (address_uri, VCARD["country-name"], Literal("United States")) )
                  graph.add( (person_uri, VCARD.hasAddress, URIRef(address_uri)) )
                  graph.add( (address_uri, VCARD["street-address"], Literal(street_address)) )
                  graph.add( (address_uri, VCARD.locality, Literal(locality)) )
                  graph.add( (address_uri, VCARD.region, Literal(region)) )
                  graph.add( (address_uri, VCARD["postal-code"], Literal(postal_code)) )
      
                  department = contact.get("department")
                  department2 = profile_data.get("department")
                  position = contact.get("position")
                  title = contact.get("longTitle")
                  affiliation_type = contact.get("affiliationType")
                  current_role = profile_data.get("currentRoleAtStanford")
      
  • advisees = profile_data.get("advisees")

  • organizations = profile_data.get("organizations")

  • primary_contact = profile_data.get("primaryContact")

  • employeeId = profile_data.get("universityId")

  • keywords = profile_data.get("keywords")

  • if advisees: for advisee in advisees: if advisee.get("advisee"): advisee = advisee.get("advisee") advisee_alias = advisee.get("alias") advisee_firstName = advisee.get("firstName") advisee_lastName = advisee.get("lastName") if advisee.get("label"): advisee_label = advisee.get("label").get("text") else: advisee_label = None advisee_id = advisee.get("profileId") advisor_role = advisee.get("role") advisor_code = advisee.get("code")

                  # Create advisee entity
                  advisee_uri = people_ns[str(advisee_id)]
                  graph.add( (advisee_uri, RDF.type, FOAF.Agent) )
                  graph.add( (advisee_uri, RDF.type, FOAF.Person) )
                  advisee_name_uri = name_ns[str(advisee_id)]
                  graph.add( (advisee_uri, VCARD.fn, Literal(advisee_label)) )
                  graph.add( (advisee_uri, VCARD.hasName, advisee_name_uri) )
                  graph.add( (advisee_name_uri, RDF.type, VCARD.Name) )
                  graph.add( (advisee_name_uri, VCARD["given-name"], Literal(advisee_firstName)) )
                  graph.add( (advisee_name_uri, VCARD["family-name"], Literal(advisee_lastName)) )
    
                  # Create advisor relationship URI, advisor role URI
                  relationship_uri = rel_ns[str(advisee_id) + "_" + str(person_id)]
                  advisor_role_uri = role_ns["AdvisorRole"]
                  advisee_role_uri = role_ns["AdviseeRole"]
                  graph.add( (relationship_uri, RDF.type, VIVO.AdvisingRelationship) )
                  graph.add( (advisor_role_uri, RDF.type, VIVO.AdvisorRole) )
                  graph.add( (advisee_role_uri, RDF.type, VIVO.AdviseeRole) )
                  graph.add( (person_uri, VIVO.relatedBy, relationship_uri) )
                  graph.add( (advisee_uri, VIVO.relatedBy, relationship_uri) )
                  graph.add( (relationship_uri, VIVO.relates, person_uri) )
                  graph.add( (relationship_uri, VIVO.relates, advisee_uri) )
                  graph.add( (person_uri, OBO.RO_0000053, advisor_role_uri) )
                  graph.add( (advisor_role_uri, OBO.RO_0000052, person_uri) )
                  graph.add( (advisee_uri, OBO.RO_0000053, advisee_role_uri) )
                  graph.add( (advisee_role_uri, OBO.RO_0000052, advisee_uri) )
    
  • if organizations: for org in organizations: if org.get("organization"): if org.get("organization").get("label"): if org.get("organization").get("label").get("text"): org_id = org["organization"].get("label").get("text").replace(" ", "") org_uri = org_ns[org_id] org_aff = org["affiliation"] else: org_id = org["organization"].get("label").get("html").replace(" ", "") org_uri = org_ns[org_id] org_aff = org["affiliation"]

              # Create Position URI
    
  • position_uri = position_ns[org_aff + "" + org_id + "" + str(person_id)] graph.add( (person_uri, VIVO.relatedBy, position_uri) ) graph.add( (org_uri, VIVO.relatedBy, position_uri) ) graph.add( (position_uri, VIVO.relates, org_uri) ) graph.add( (position_uri, VIVO.relates, person_uri) ) graph.add( (position_uri, RDFS.label, Literal(org_aff)) ) graph.add( (position_uri, DCTERMS.date, Literal('unknown/2018-08')) ) graph.add( (position_uri, RDF.type, VIVO.Position) ) if primary_contact: if primary_contact.get("title"): graph.add( (position_uri, VIVO.hrJobTitle, Literal(primary_contact.get("title"))))

  • if primary_contact: if primary_contact.get("email"): graph.add( (person_uri, VCARD.hasEmail, Literal(primary_contact.get("email"))))

  • if keywords: for keyword in keywords: if keyword: keyword = keyword.strip() keyword = urllib.quote_plus(keyword.replace(" ", "_")) keyword_uri = concept_ns[keyword] graph.add( (keyword_uri, RDF.type, SKOS.Concept) ) graph.add( (keyword_uri, RDFS.label, Literal(keyword)) ) graph.add( (person_uri, VIVO.hasResearchArea, keyword_uri) )

Clone this wiki locally