[Notes] CSCI 585 Intro' to NoSQL

06 Mar 2020 | CSCI585, English/英文, Note

Credit to: Prof. Saty Raghavachary, CSCI 585, Spring 2020

Outline

NoSQL mean
advantage
JSON
…

What does ‘No’ stand for?

NoSQL DB means/meant:

Non relational, non SQL
NO SQL
NotOnly SQL
..

SQL is dead! Long live SQL!

Relational DBs held sway for almost 3 DECADES: 80s, 90s, 00s.

What started to change was this - the Internet, coupled with storage and processing revolution has enabled:

BigUsers - LOTS of people using databases
BigData - LOTS of data being generated
BigVariety - there is a huge VARIETY in the types of data being stored and searched
BigFlux - there is rapid CHANGE in a lot of data being handled

RDBMSs are simply unsuitable for the above! A different kind of NON-RELATIONAL database scheme was needed - enter ‘NoSQL’ DBs.

Need for NoSQL

Again - what is happening is this:

lots of new data, new types of data, are being rapidly generated
developers are finding it hard to ‘shoehorn’ all this data into a relational model
also hard to scale up to fit more data, more users
and, hard to keep up performance too

Need a flexible, efficient, available, scalable solution/DB design! THAT is what NoSQL provides - high performance, high availability at a large scale.

NoSQL - history

The term NoSQL was used as early as.. 1998!

The term started to become reused in 2009, by Last.fm’s developer, and subsequently by a Rackspace employee blogger who popularized it.

Today, NoSQL refers to an umbrella of technologies that are all non-relational-DB-oriented.

NoSQL DBs

A NoSQL DB is:

schema-less: no tables, no relations!
flexible: easy to add new types of data
(data) scalable: specifically, ability to ‘scale out’, ie. do ‘horizontal scaling’ - both terms means that we can simply add more nodes (eg. servers) to an existing cluster, to accommodate more users, or to add more data to existing users.
fast: easy to process large (massive) volumes of data

Drivers of the NoSQL movement

What led to NoSQL starting to get adopted?

Avoidance of complexity [eg. no need to worry about immediate consistency]
High throughput
Easy, quick scalability
Ability to run on commodity hardware
Avoidance of need for O-R mapping
Avoidance of complexity associated with cluster setup
Realization that ‘one-size-fits-all’ was wrong [Stonebraker]
Ability to use DB APIs for various programming languages, that mirror the languages’ own structures

JSON (or XML) for storing an ENTIRE db!!

db == JSON []array of {}objects, where each object (‘row’) is a set of a key-value pairs.

http://www.data.gov has almost 200,000+ datasets, 50,000+ of which are in JSON.

Example: https://health.data.ny.gov/api/views/es3k-2aus/rows.json

The simplicity of the JSON scheme also makes it possible/easy to add extra data (eg metadata) to the files. RDB -> JSON It is very straightforward to represent table data as JSON:

RDB -> JSON, second example Joined data again:

RDB to JSON, third example Note - we also have the choice of using a ‘related model’ (where a JSON value is contained in a related (pointed-to) structure), instead of a ‘nested model’ (representing joined data):

An XML db example Compared to JSON, XML is more verbose (on account of opening and closing tags), but is a popular alternative format for creating DBs. Here is an example file (in addition to viewing the data here in the browser, you can use a viewer such as this one to view the XML file, or you can save locally and open with a text editor). BASE, not ACID The NoSQL DBs are characterized by their ‘BASE’ property, in contrast with RDBMS’ ‘ACID’ property.

BASE stands for BAsic availability (db is up most of the time), Soft state (of consistency - consistency is not guaranteed while data is written to a node, or between replicas), Eventual consistency (at a later point in time, by push or pull, data will become consistent across nodes and replicas).

‘Schema-less’ A NoSQL database does not have an explicit schema that describes the relationships between its data items.

Calling this situation ‘schema-less’ is not quite appropriate - rather, the schema is implicit - the schema resides in the application code that reads and writes data.

To put it differently, the DB itself “doesn’t care” about what it is storing, it is the application code that imparts meaning to the data. As a result, changing the data model (eg. adding or deleting an attribute) is trivial - just write and run (application) code to make the change in the DB!

In a schema-less environment, developers use intuitive data structures (well supported by underlying host languages) to do data manipulation (including querying and updating).

Eg. JSON is very easy/intuitive to grasp, comprising of just six underlying datatypes (number, string, boolean, array, object, null). It is also quite easy to parse.

Netside where knowledge is shared, ideas are spread.

[Notes] CSCI 585 Intro' to NoSQL

What does ‘No’ stand for?

SQL is dead! Long live SQL!

Need for NoSQL

NoSQL - history

NoSQL DBs

Drivers of the NoSQL movement

JSON (or XML) for storing an ENTIRE db!!

Related Posts

[Notes] CSCI 585 discussion 11 Visualization 27 Apr 2020

[Notes] CSCI 585 discussion 5 AL 20 Apr 2020

[Notes] CSCI 585 discussion 9 ML 14 Apr 2020