[Notes] CSCI 585 Spatial DBs
06 Mar 2020 | CSCI585, English/英文, NoteCredit to: Prof. Saty Raghavachary, CSCI 585, Spring 2020
Outline
- spatial DBs: definition, characteristics, need, creation.
- spatial datatypes
- spatial operators
- spatial indices
- implementations
- miscellany
Spatial Database (SD)
What is a spatial database?
“A spatial database is a database that is optimized to store and query data related to objects in space, including points, lines and polygons.”
In other words, it includes objects that have a SPATIAL location (and extent). A chief category of spatial data is geospatial data - derived from the geography of our earth.
Characteristics of geographic data:
- has location
- has size
- is auto-correlated
- scale dependent
- might be temporally dependent too
Geographic data is NOT ‘business as usual’!
Entity view vs field view
In spatial data analysis, we distinguish between two conceptions of space:
- entity view: space as an area filled with a set of discrete objects
- field view: space as an area covered with essentially continuous surfaces
For our purposes, we will adopt the ‘entity’ view, where space is populated by discrete objects (roads, buildings, rivers..).
Components
So a spatial DB is a collection of the following, specifically built to handle spatial data:
- types
- operators
- indices
What can be plotted on to a map?
- crime data
- spread of disease, risk of disease [look at this too]
- drug overdoses - over time
- census data
- income distribution, home prices
- locations of Starbucks (!)
- (real-time) traffic
- agricultural land use, deforestation
Who creates/uses spatial data?
Various government agencies routinely coordinate spatial data collection and use, operating in effect, a national spatial data infrastructure (NSDI) - these include federal, state and local agencies. At the federal level, participating agencies include:
- Department of Commerce
- Bureau of the Census
- NIST
- NOAA
- Department of Defense
- Army Corps of Engineers
- Defense Mapping Agency
- Department of the Interior
- Bureau of Land Management
- Fish and Wildlife Service
- U.S Geological Survey
- Department of Agriculture
- Agricultural Stabilization and Conservation Service
- Economic Research Service
- Forest Service
- National Agriculture Statistical Service
- Soil Conservation Service
- Department of Transportation
- Federal Highway Administration
- Environmental Protection Agency
- NASA As you can see, spatial data is a SERIOUS resource, vital to national interests.
Where does spatial data come from?
Spatial data is created in a variety of ways:
- CAD: user creation
- CAD: reverse engineering
- maps: cartography (surveying, plotting)
- maps: satellite imagery
- maps: ‘copter, drone imagery
- maps: driving around
- maps: walking around
What to store?
All spatial data can be described via the following entities/types:
- points/vertices/nodes
- polylines/arcs/linestrings
- polygons/regions
- pixels/raster
Points, lines, polys => models and non-spatial attrs
Once we have spatial data (points, lines, polygons), we can:
- ‘model’ features such as lakes, soil type, highways, buildings etc, using the geometric primitives as underlying types
- add ‘extra’, non-spatial attributes/features to the underlying spatial data
SDBMS architecture
GIS vs SDBMS
GIS is a specific application architecture built on top of a [more general purpose] SDBMS.
GIS typically tend to be used for:
Spatial relationships
In 1D (and higher), spatial relationships can be expressed using ‘intersects’, ‘crosses’, ‘within’, ‘touches’ (these are T/F predicates).
Here is a sampling of spatial relationships in 2D:
Minimum Bounding Rectangles (MBRs) are what are used to compute the results of operations shown above:
Spatial relations - categories
Spatial relationships can be:
- topology-based [using defns of boundary, interior, exterior]
- metric-based [distance/Euclidian, angle measures]
- direction-based
- network-based [eg. shortest path]
Topological relationships could be further grouped like so:
- proximity
- overlap
- containment
How can we put these relations to use?
We can perform the following, on spatial data:
- spatial measurements: find the distance between points, find polygon area..
- spatial functions: find nearest neighbors..
- spatial predicates: test for proximity, containment..
Spatial operators, functions
Oracle Spatial
Oracle offers a ‘Spatial’ library for spatial queries - this includes UDTs and custom functions to process them.
Spatial Indexing
- Used ti optimize spatial query performance
- R-tree Indexing
- Based on minimum bounding rectangles ((MBRs) for 2D data or minimum bounding volume(MBVs) for 3D data
- Indexes 2, 3, or 4 dimensions
- Provides an exclusive and exhaustive coverage of spatial objects
- Indexes all elements withing geometry including points, lines, and polygons
Postgres PostGIS
The function names for queries differ across geodatabases.The following list contains commonly used functions built into PostGIS, a free geodatabase which is a Postgre SOL wztension (the term ‘geometry’ refers to a point, line, box or other two or three dimensional shape):
- Distance (geometry, geometry): number
- Equals (geometry, geometry): boolean
- Disjoint (geometry, geometry): boolean
- Intersects (geometry, geometry): boolean
- Touches (geometry, geometry): boolean
- Crosses (geometry, geometry): boolean
- Overlap (geometry, geometry): boolean
- Contains (geometry, geometry): boolean
- Intersects (geometry): boolean
- Length (geometry): number
- Area (geometry): number
- Centroid ((geometry) : geometry
Creating spatial indexes
As (more so than) with non-spatial data, the creation and use of spatial indexes VASTLY speed up processing!
Can B Trees index spatial data?
In short, YES, if we pair it up with a ‘z curve’ indexing scheme (using a space-filling curve):
The idea is to quantize every (x,y) location into a recursively-divided ‘quadtree’ cell, and use the cell’s binary (x,y) location to create a (binary) ‘z’ key, which is ordered along the unit (0..1) interval - in other words, 2D (x,y) points get mapped (indexed) to ordered 1D ‘z’ locations.
But, this is of academic interest mostly, not commonly practiced in industry - Apple’s FoundationDB is an exception.
R trees
R trees use MBRs to create a hierarchy of bounds.
Variations, FYI: R+ tree, R* tree, Buddy trees, Packed R trees..
k-d trees
K-D-B trees
Quadtrees (and octrees):
Each node is either a leaf node, with indexed points or null, or an internal (non-leaf) node that has exactly 4 children. The hierarchy of such nodes forms the quadtree.
indexing evolution
Query processing
Visualizing spatial data
A variety of non-spatial attrs can be mapped on to spatial data, providing an intuitive grasp of patterns, trends and abnormalities. Following are some examples.
Dot map:
Proportional symbol map:
Diagram map:
Another diagram map:
Also possible to plot multivariate data this way.
Choropleth maps (plotting of a variable of interest, to cover an entire region of a map):
So who (else) has spatial extensions?
Everyone!
Thanks to SQL’s facility for custom datatype (‘UDT’) and function creation (‘functional extension’), “spatial” has been implemented for every major DB out there:
Oracle: Locator, Spatial, SDO Postgres: PostGIS DB2: Spatial Datablade Informix: Geodetic Datablade SQL Server: Geometric and Geodetic Geography types MySQL: spatial library comes ‘built in’ SQLite: SpatiaLite ..
Google KML
Google’s KML format is used to encode spatial data for Google Earth, etc. Here is a page on importing other geospatial dataset formats into Google Earth.
OpenLayers
OpenLayers is an open GIS platform.
ESRI: Arc*
ESRI is the home of the powerful, flexible family of ArcGIS products - and they are local!
QGIS etc.
There is a variety of inexpensive/open source mapping platforms, competing with more pricey commercial offerings (from ESRI etc). Here are several:
- QGIS
- MapBox
- Carto
- Boundless
- GIS Cloud