Netside where knowledge is shared, ideas are spread.

[Notes] CSCI 585 Spatial DBs

Credit to: Prof. Saty Raghavachary, CSCI 585, Spring 2020

Outline

  • spatial DBs: definition, characteristics, need, creation.
  • spatial datatypes
  • spatial operators
  • spatial indices
  • implementations
  • miscellany

Spatial Database (SD)

What is a spatial database?

“A spatial database is a database that is optimized to store and query data related to objects in space, including points, lines and polygons.”

In other words, it includes objects that have a SPATIAL location (and extent). A chief category of spatial data is geospatial data - derived from the geography of our earth.

Characteristics of geographic data:

  • has location
  • has size
  • is auto-correlated
  • scale dependent
  • might be temporally dependent too

Geographic data is NOT ‘business as usual’!

Entity view vs field view

In spatial data analysis, we distinguish between two conceptions of space:

  • entity view: space as an area filled with a set of discrete objects
  • field view: space as an area covered with essentially continuous surfaces

For our purposes, we will adopt the ‘entity’ view, where space is populated by discrete objects (roads, buildings, rivers..).

Components

Components

So a spatial DB is a collection of the following, specifically built to handle spatial data:

  • types
  • operators
  • indices

What can be plotted on to a map?

  • crime data
  • spread of disease, risk of disease [look at this too]
  • drug overdoses - over time
  • census data
  • income distribution, home prices
  • locations of Starbucks (!)
  • (real-time) traffic
  • agricultural land use, deforestation

Who creates/uses spatial data?

Various government agencies routinely coordinate spatial data collection and use, operating in effect, a national spatial data infrastructure (NSDI) - these include federal, state and local agencies. At the federal level, participating agencies include:

  • Department of Commerce
    • Bureau of the Census
    • NIST
    • NOAA
  • Department of Defense
    • Army Corps of Engineers
    • Defense Mapping Agency
  • Department of the Interior
    • Bureau of Land Management
    • Fish and Wildlife Service
    • U.S Geological Survey
  • Department of Agriculture
    • Agricultural Stabilization and Conservation Service
    • Economic Research Service
    • Forest Service
    • National Agriculture Statistical Service
    • Soil Conservation Service
  • Department of Transportation
    • Federal Highway Administration
  • Environmental Protection Agency
  • NASA As you can see, spatial data is a SERIOUS resource, vital to national interests.

Where does spatial data come from?

Spatial data is created in a variety of ways:

  • CAD: user creation
  • CAD: reverse engineering
  • maps: cartography (surveying, plotting)
  • maps: satellite imagery
  • maps: ‘copter, drone imagery
  • maps: driving around
  • maps: walking around

What to store?

All spatial data can be described via the following entities/types:

  • points/vertices/nodes
  • polylines/arcs/linestrings
  • polygons/regions
  • pixels/raster

Points, lines, polys => models and non-spatial attrs

Once we have spatial data (points, lines, polygons), we can:

  • ‘model’ features such as lakes, soil type, highways, buildings etc, using the geometric primitives as underlying types
  • add ‘extra’, non-spatial attributes/features to the underlying spatial data

SDBMS architecture

SDBMS architecture

GIS vs SDBMS

GIS is a specific application architecture built on top of a [more general purpose] SDBMS.

GIS typically tend to be used for:

Spatial relationships

In 1D (and higher), spatial relationships can be expressed using ‘intersects’, ‘crosses’, ‘within’, ‘touches’ (these are T/F predicates).

Here is a sampling of spatial relationships in 2D:

Minimum Bounding Rectangles (MBRs) are what are used to compute the results of operations shown above:

Spatial relations - categories

Spatial relationships can be:

  • topology-based [using defns of boundary, interior, exterior]
  • metric-based [distance/Euclidian, angle measures]
  • direction-based
  • network-based [eg. shortest path]

Topological relationships could be further grouped like so:

  • proximity
  • overlap
  • containment

How can we put these relations to use?

We can perform the following, on spatial data:

  • spatial measurements: find the distance between points, find polygon area..
  • spatial functions: find nearest neighbors..
  • spatial predicates: test for proximity, containment..

Spatial operators, functions

more

Oracle Spatial

Oracle offers a ‘Spatial’ library for spatial queries - this includes UDTs and custom functions to process them.

Spatial Indexing

  • Used ti optimize spatial query performance
  • R-tree Indexing
    • Based on minimum bounding rectangles ((MBRs) for 2D data or minimum bounding volume(MBVs) for 3D data
    • Indexes 2, 3, or 4 dimensions
  • Provides an exclusive and exhaustive coverage of spatial objects
  • Indexes all elements withing geometry including points, lines, and polygons

Postgres PostGIS

The function names for queries differ across geodatabases.The following list contains commonly used functions built into PostGIS, a free geodatabase which is a Postgre SOL wztension (the term ‘geometry’ refers to a point, line, box or other two or three dimensional shape):

  1. Distance (geometry, geometry): number
  2. Equals (geometry, geometry): boolean
  3. Disjoint (geometry, geometry): boolean
  4. Intersects (geometry, geometry): boolean
  5. Touches (geometry, geometry): boolean
  6. Crosses (geometry, geometry): boolean
  7. Overlap (geometry, geometry): boolean
  8. Contains (geometry, geometry): boolean
  9. Intersects (geometry): boolean
  10. Length (geometry): number
  11. Area (geometry): number
  12. Centroid ((geometry) : geometry

Creating spatial indexes

As (more so than) with non-spatial data, the creation and use of spatial indexes VASTLY speed up processing!

Can B Trees index spatial data?

In short, YES, if we pair it up with a ‘z curve’ indexing scheme (using a space-filling curve):

The idea is to quantize every (x,y) location into a recursively-divided ‘quadtree’ cell, and use the cell’s binary (x,y) location to create a (binary) ‘z’ key, which is ordered along the unit (0..1) interval - in other words, 2D (x,y) points get mapped (indexed) to ordered 1D ‘z’ locations.

But, this is of academic interest mostly, not commonly practiced in industry - Apple’s FoundationDB is an exception.

R trees

R trees use MBRs to create a hierarchy of bounds.

Variations, FYI: R+ tree, R* tree, Buddy trees, Packed R trees..

k-d trees

K-D-B trees

Quadtrees (and octrees):

Each node is either a leaf node, with indexed points or null, or an internal (non-leaf) node that has exactly 4 children. The hierarchy of such nodes forms the quadtree.

indexing evolution

Query processing

Visualizing spatial data

A variety of non-spatial attrs can be mapped on to spatial data, providing an intuitive grasp of patterns, trends and abnormalities. Following are some examples.

Dot map:

Proportional symbol map:

Diagram map:

Another diagram map:

Also possible to plot multivariate data this way.

Choropleth maps (plotting of a variable of interest, to cover an entire region of a map):

So who (else) has spatial extensions?

Everyone!

Thanks to SQL’s facility for custom datatype (‘UDT’) and function creation (‘functional extension’), “spatial” has been implemented for every major DB out there:

Oracle: Locator, Spatial, SDO Postgres: PostGIS DB2: Spatial Datablade Informix: Geodetic Datablade SQL Server: Geometric and Geodetic Geography types MySQL: spatial library comes ‘built in’ SQLite: SpatiaLite ..

Google KML

Google’s KML format is used to encode spatial data for Google Earth, etc. Here is a page on importing other geospatial dataset formats into Google Earth.

OpenLayers

OpenLayers is an open GIS platform.

ESRI: Arc*

ESRI is the home of the powerful, flexible family of ArcGIS products - and they are local!

QGIS etc.

There is a variety of inexpensive/open source mapping platforms, competing with more pricey commercial offerings (from ESRI etc). Here are several:

  • QGIS
  • MapBox
  • Carto
  • Boundless
  • GIS Cloud