[Notes] CSCI 585 Spatial DBs

06 Mar 2020 | CSCI585, English/英文, Note

Credit to: Prof. Saty Raghavachary, CSCI 585, Spring 2020

Outline

spatial DBs: definition, characteristics, need, creation.
spatial datatypes
spatial operators
spatial indices
implementations
miscellany

Spatial Database (SD)

What is a spatial database?

“A spatial database is a database that is optimized to store and query data related to objects in space, including points, lines and polygons.”

In other words, it includes objects that have a SPATIAL location (and extent). A chief category of spatial data is geospatial data - derived from the geography of our earth.

Characteristics of geographic data:

has location
has size
is auto-correlated
scale dependent
might be temporally dependent too

Geographic data is NOT ‘business as usual’!

Entity view vs field view

In spatial data analysis, we distinguish between two conceptions of space:

entity view: space as an area filled with a set of discrete objects
field view: space as an area covered with essentially continuous surfaces

For our purposes, we will adopt the ‘entity’ view, where space is populated by discrete objects (roads, buildings, rivers..).

Components

So a spatial DB is a collection of the following, specifically built to handle spatial data:

types
operators
indices

What can be plotted on to a map?

crime data
spread of disease, risk of disease [look at this too]
drug overdoses - over time
census data
income distribution, home prices
locations of Starbucks (!)
(real-time) traffic
agricultural land use, deforestation

Who creates/uses spatial data?

Various government agencies routinely coordinate spatial data collection and use, operating in effect, a national spatial data infrastructure (NSDI) - these include federal, state and local agencies. At the federal level, participating agencies include:

Department of Commerce
- Bureau of the Census
- NIST
- NOAA
Department of Defense
- Army Corps of Engineers
- Defense Mapping Agency
Department of the Interior
- Bureau of Land Management
- Fish and Wildlife Service
- U.S Geological Survey
Department of Agriculture
- Agricultural Stabilization and Conservation Service
- Economic Research Service
- Forest Service
- National Agriculture Statistical Service
- Soil Conservation Service
Department of Transportation
- Federal Highway Administration
Environmental Protection Agency
NASA As you can see, spatial data is a SERIOUS resource, vital to national interests.

Where does spatial data come from?

Spatial data is created in a variety of ways:

CAD: user creation
CAD: reverse engineering
maps: cartography (surveying, plotting)
maps: satellite imagery
maps: ‘copter, drone imagery
maps: driving around
maps: walking around

What to store?

All spatial data can be described via the following entities/types:

points/vertices/nodes
polylines/arcs/linestrings
polygons/regions
pixels/raster

Points, lines, polys => models and non-spatial attrs

Once we have spatial data (points, lines, polygons), we can:

‘model’ features such as lakes, soil type, highways, buildings etc, using the geometric primitives as underlying types
add ‘extra’, non-spatial attributes/features to the underlying spatial data

SDBMS architecture

GIS vs SDBMS

GIS is a specific application architecture built on top of a [more general purpose] SDBMS.

GIS typically tend to be used for:

Spatial relationships

In 1D (and higher), spatial relationships can be expressed using ‘intersects’, ‘crosses’, ‘within’, ‘touches’ (these are T/F predicates).

Here is a sampling of spatial relationships in 2D:

Minimum Bounding Rectangles (MBRs) are what are used to compute the results of operations shown above:

Spatial relations - categories

Spatial relationships can be:

topology-based [using defns of boundary, interior, exterior]
metric-based [distance/Euclidian, angle measures]
direction-based
network-based [eg. shortest path]

Topological relationships could be further grouped like so:

proximity
overlap
containment

How can we put these relations to use?

We can perform the following, on spatial data:

spatial measurements: find the distance between points, find polygon area..
spatial functions: find nearest neighbors..
spatial predicates: test for proximity, containment..

Spatial operators, functions

Oracle Spatial

Oracle offers a ‘Spatial’ library for spatial queries - this includes UDTs and custom functions to process them.

Spatial Indexing

Used ti optimize spatial query performance
R-tree Indexing
- Based on minimum bounding rectangles ((MBRs) for 2D data or minimum bounding volume(MBVs) for 3D data
- Indexes 2, 3, or 4 dimensions
Provides an exclusive and exhaustive coverage of spatial objects
Indexes all elements withing geometry including points, lines, and polygons

Postgres PostGIS

The function names for queries differ across geodatabases.The following list contains commonly used functions built into PostGIS, a free geodatabase which is a Postgre SOL wztension (the term ‘geometry’ refers to a point, line, box or other two or three dimensional shape):

Distance (geometry, geometry): number
Equals (geometry, geometry): boolean
Disjoint (geometry, geometry): boolean
Intersects (geometry, geometry): boolean
Touches (geometry, geometry): boolean
Crosses (geometry, geometry): boolean
Overlap (geometry, geometry): boolean
Contains (geometry, geometry): boolean
Intersects (geometry): boolean
Length (geometry): number
Area (geometry): number
Centroid ((geometry) : geometry

Creating spatial indexes

As (more so than) with non-spatial data, the creation and use of spatial indexes VASTLY speed up processing!

Can B Trees index spatial data?

In short, YES, if we pair it up with a ‘z curve’ indexing scheme (using a space-filling curve):

The idea is to quantize every (x,y) location into a recursively-divided ‘quadtree’ cell, and use the cell’s binary (x,y) location to create a (binary) ‘z’ key, which is ordered along the unit (0..1) interval - in other words, 2D (x,y) points get mapped (indexed) to ordered 1D ‘z’ locations.

But, this is of academic interest mostly, not commonly practiced in industry - Apple’s FoundationDB is an exception.

R trees

R trees use MBRs to create a hierarchy of bounds.

Variations, FYI: R+ tree, R* tree, Buddy trees, Packed R trees..

k-d trees

K-D-B trees

Quadtrees (and octrees):

Each node is either a leaf node, with indexed points or null, or an internal (non-leaf) node that has exactly 4 children. The hierarchy of such nodes forms the quadtree.

indexing evolution

Query processing

Visualizing spatial data

A variety of non-spatial attrs can be mapped on to spatial data, providing an intuitive grasp of patterns, trends and abnormalities. Following are some examples.

Dot map:

Proportional symbol map:

Diagram map:

Another diagram map:

Also possible to plot multivariate data this way.

Choropleth maps (plotting of a variable of interest, to cover an entire region of a map):

So who (else) has spatial extensions?

Everyone!

Thanks to SQL’s facility for custom datatype (‘UDT’) and function creation (‘functional extension’), “spatial” has been implemented for every major DB out there:

Oracle: Locator, Spatial, SDO Postgres: PostGIS DB2: Spatial Datablade Informix: Geodetic Datablade SQL Server: Geometric and Geodetic Geography types MySQL: spatial library comes ‘built in’ SQLite: SpatiaLite ..

Google KML

Google’s KML format is used to encode spatial data for Google Earth, etc. Here is a page on importing other geospatial dataset formats into Google Earth.

OpenLayers

OpenLayers is an open GIS platform.

ESRI: Arc*

ESRI is the home of the powerful, flexible family of ArcGIS products - and they are local!

QGIS etc.

There is a variety of inexpensive/open source mapping platforms, competing with more pricey commercial offerings (from ESRI etc). Here are several:

QGIS
MapBox
Carto
Boundless
GIS Cloud

Netside where knowledge is shared, ideas are spread.