Labels

Sunday, December 6, 2020

No SQL DB

Please Read Thru - http://arun-architect.blogspot.com/2016/11/why-nosql-vs-sql.html

The NoSQL ecosystem has four flavors: 

  1. Key-Value DB
  2. Columnar DB
  3. Graph DB
  4. Document DB

SQL ecosystem has comes only in single flavor - Relational data store.

All four NoSQL data abstractions are both easy and complex. Well SQL on the other hand are extremely flexible but fail at large scale as they can only grow vertically (at least traditionally).

Let us dive into the details of these data abstractions.

1). Key-Value Stores  - 

  • Key value stores do not impose a specific schema. Treats the data as a single opaque collection which may have different fields for every record. This simplicity of this model makes a key-value store fast, easy to use, scalable, portable and flexible.
  • Provide extremely fast lookup and update of values based on a certain key. The underlying hash implementation provides extremely fast lookups and updates. 
  • Because the keys can be partitioned easily, the systems grow horizontally instead of vertically, making the scaling problem a lot easier (and hence more amenable to be a NoSQL solution).  
  • Key-value stores often use far less memory to store the same database, which can lead to large performance gains in certain workloads.
  • Key-value stores handle size well and are good at processing a constant stream of read/write operations with low latency. 

How do key-value stores work?

In each key-value pair the key is represented by an arbitrary string such as a filename, URI or hash. Value can be any kind of data like an image, user preference file or document. The value is stored as a blob requiring no upfront data modeling or schema definition.

The storage of the value as a blob removes the need to index the data to improve performance. 

In general, key-value stores have no query language. They provide a way to store, retrieve and update data using simple get, put and delete commands; the path to retrieve data is a direct request to the object in memory or on disk. 

List of No-SQL Key-Value DB:

  1. Amazon - DynamoDB
  2. Azure - Cosmos DB
  3. Google - Cloud Datastore/Memorystore
  4. Others:
    • Aerospike
    • Apache Cassandra
    • Berkeley DB
    • Foundationn DB
    • Memcached DB
    • Couchbase Server
    • Redis
    • Riak

2) Columnar Data Stores:  

Leverage the fact that while a single document has a number of attributes, not all attributes are created equal. Usually, a certain attribute or a group of attributes is accessed/used more frequently than others. The data is stored and managed leveraging this characteristic, making it easy to scale systems horizontally (and natural to add columns later). 

Data locality is significantly improved making systems super fast (because frequently accessed data is only what is stored together, wasting no space). 

List of Columnar DB:

  1. Amazon - Maria DB, Casandara
  2. Azure - Cosmos DB
  3. Google - BigQuery
  4. Apache - Kudu, Parquet, Hbase

3). Graph Data Stores  - 

NoSQL graph database is a technology for data management designed to handle very large sets of structured, semi-structured or unstructured data. Capable of integrating heterogeneous data from many sources and making links between datasets. It focuses on the relationships between entities and is able to infer new knowledge out of existing information.

Thus it excels in maintaining relationships across documents (and navigating across documents through relationships) in a very fast manner. Nodes in the graph (think documents or references to documents) can be partitioned fairly easily making it conducive to building horizontally scalable systems. 

  1. Amazon - Neptune
  2. Azure - Cosmos DB
  3. Google - 
  4. Apache - Titan, Giraph
  5. Others
    • Neo4j
    • ArangoDB
    • OrientDB
    • FlockDB
    • DataStax
    • Cassandra
    • Titan

4) Document data stores:

As the name suggests, the document stores, organize data as a document. There are no tables, rows, columns. All the information related to one entity or aggregate unit is stored in one document. Thus when we query for that entity, we get all the information, ideally without requiring multiple references or joins.

An aggregate is a collection of data that we interact with as a unit. These units of data or aggregates form the boundaries for ACID operations with the database. Aggregates make it easier for the database to manage data storage over clusters, since the unit of data now could reside on any machine and when retrieved from the database gets all the related data along with it.

A document database is, at its core, a key/value store with one major exception. Instead of just storing any blob in it, a document db requires that the data will be store in a format that the database can understand. The format can be XML, JSON, Binary JSON (MongoDB), or just about anything, as long as the database can understand it. So a document database would store the employee information as one document, along with the metadata, enabling the search based on the fields of the entity. Thus the document stores are suitable for loosly structured or semistructured data.

Unlike relational databases, document stores are not strongly typed. Document databases get their type information from the data itself, normally store all related information together, and allow every instance of data to be different from any other. This makes them more flexible in dealing with change and optional values, maps more easily into program objects, and often reduces database size.

Thus document databases are schema-agnostic but they can enforce a schema when needed because they are also structure-aware. This approach—having schema when you need it—is a huge change from the relational world where it might take months of work to manage changes to schema design.

List of Document DB:

  1. Amazon - Dynamo DB
  2. Azure - Cosmos DB
  3. Google - Cloud BigTable
  4. Apache - 
  5. Others
    • Mongo DB
    • Couch DB
    • Firestore
    • Firebase

Finally, the Relational Data abstraction provided by the SQL ecosystem allows for data to be sliced and analyzed in an extremely flexible manner. The main drawback of this approach is that the systems that supported this model only scaled vertically (though modern systems like AWS’s Amazon Aurora try to stretch the level of horizontal scaling). All cloud vendors offer SQL offerings (AWS – Amazon Aurora, Amazon RDS; Azure – Azure SQL, Managed MySQL, MariaDB; GCP – Cloud SQL).

Ref: 

Hope This helps!!

Arun Manglick