Gazetteer data and services ============================ What are gazetteers? --------------------- * Datasets that identify (stably) and describe (in both structured and unstructured terms) entities of interest to users of the gazetteer and users of other datasets * Combine characteristics of encyclopedias and controlled vocabularies * Datasets that, though not designed for it, can be used or refactored to be used, in this manner * Entity examples: geographic places, persons, objects Core use cases --------------- * Dataset alignment (existing local strings or IDs augmented by identifying and acquiring gazetteer IDs for the entities of interest) * Text parsing (named entity recognition) * Text annotation (string "Alabama" means place:15354) * Discovery that exploits dataset alignment: e.g., find all records in photo databases for object:coin from place:Atlantis * Data preservation and normalization * E.g. special-purpose or static gazetteers and datasets migrated to larger or more general systems still active (active curation as opposed to putting datasets "on ice") * Visualize texts, corpora, datasets, arbitrary collections of records from single or multiple datasets, e.g.: * Geographic gazetteer: map it (where, density heat map, etc.) * Prosopographic gazetteer: social graph * Temporal gazetteer: timeline * Multiple gazetteers: dynamic map over time * Gazetteer alignment (My "foo" is the same as your "bar") * Gazetteer aggregation (your uniqueness will be joined to our collective) "Special" problems in humanities gazetteers -------------------------------------------- * Incomplete data * Uncertainty * Citation * Attribution Example ancient history gazetteers ----------------------------------- * PeriodO * Pleiades * SNAP:DRAGN