Data integration requires a way to specify the structure of data resources ...
- Unstructured and semi-structured sources (document collections, message traffic, web pages, ...)
- Structured data without an explicit data schema (non-local databases, data tables, charts and reports, ...)
- Non-Text collections (image, video, sound, ...)
- Streams of data from sensors, programs, services
... so a processor can tell how the "attributes" and "values" are related
- What is required vs. optional?
- How many values for a particular attribute?
- What attributes are keys for other attributes?
- Which attributes are necessarily related to other attributes and in what way??
- How do the attributes (and values) in one data source map to attributes and values describing another source?
A machine readable collection of these entity relationship terms, and how they related one to another (especially for unstructured sources) is called an Ontology