NoSQL databases: What you should know about

Relational database model has been co-existing with us for around a quarter of a century -so much time, right?-, but a new class of database has emerged in the enterprise. I’m talking about NoSQL.

 

What is NoSQL?

NoSQL, also known as “non-relational” or “cloud”, is a broad class of database management system with significant differences from a classic relational database management system (RDBMS). The stored data not require fixed table schemas. It usually avoids join operations and typically scale horizontally.

 

Architecture with NoSQL

NoSQL database is characterized by a move away from the complexity of SQL based servers. The logic of validation, access control, mapping querieable indexed data, correlating related data, conflic resolution, maintaining integrity constraints and triggered procedures is moved out of the database layer. This enables NoSQL databases engines to focus on exceptional performance and scalability.
A key concept of NoSQL systems is to have DBs focus on the task of high-performance scalable data storage, and provide low-level access to data management layer.
Pros & Cons
Pros
  • Improve performance – Performance metrics have shown significant improvements vs relational access. For example, this performance metric compares MySQL vs Cassandra:
    Facebook Search
    MySQL > 50 GB Data
    – Writes Average: ~300 ms
    – Reads Average: ~350 ms
    Rewritten with Cassandra > 50 GB Data
    – Writes Average: 0.12 ms
    – Reads Average: 15 ms
  • Scaling – NoSQL databases are designed to expand transparently and they’re usually designed with low-cost commodity hardware in mind.
  • Big data handling – Over the last decade, the volumes of data has been increased massively. NoSQL systems can handle big data in the same way as the biggest RDBMS.
  • Less DBA time – NoSQL databases are generally designed to requiere less management: automatic repair, data distribution and simpler data models.
  • Reduce costs – RDBMS uses expensive proprietary servers and storage systems, while NoSQL databases user clusters of cheap commodity servers. So, the cost per gigabyte or transaction/second for NoSQL can be many time less.
  • Flexible data models – NoSQL key-value stores and document databases allow the application to store virtually any structure it wants in a data element.

Cons

  • Maturity – RDBMS systems are stable and richly functional. But most NoSQL alternatives are in pre-production versions with many key features yet to be implemented.
  • Support – Most NoSQL systems are open source projects. There’re a couple of small companies offering support for each NoSQL database.
  • Analytics & BI – NoSQL databases do not offer facilities for ad-hoc query and analysis. Commonly used business intelligence tools do not provide connectivity to NoSQL systems.
  • Administration – Although the design goal of NoSQL system is to provide a zero-admin solution, it’s true that requires a lot of skill to install and a lot of effort to maintain.
  • Expertise – As NoSQL systems is a new paradigm, all developers are in a learning mode.

Use a RDBMS or a NoSQL Database?

Depends mainly on what you’re trying to achieve. It’s certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it’s quite likely that applications that do will become more common (though probably not dominant).

NoSQL Implementations

There are currently more than 122 NoSQL databases. They can be categorized by their manner of implementation:
  • Wide column store / column families – Cassandra, Hadoop, Hypertable, Cloudata, Cloudera, Amazon SimpleDB, SciDB
  • Document store – MongoDB, CouchDB, Terrastore, ThruDB, OrientDB, RavenDB, Citrusleaf, SisoDB
  • Key Value / Tuple store – Azure Table Storage, MEMBASE, Riak, Redis, Chordless, GenieDB, Scalaris, BerkeleyDB, MemcacheDB.
  • Eventually consistent Key Value store – Amazon Dynamo, Voldemort, Dynomite, KAI
  • Graph databases – Neo4J, Infinite Graph, Sones, InfoGrid, HyperGraphDB, Trinity, etc
  • And others, and others…

Early Adopters of NoSQL

Social media corporations as the primary traiblazers of NoSQL implementations. The list includes:
  • Facebook
  • Twitter
  • MySpace
  • Google (Hadoop, Google App Engine)
  • Amazon (Dynamo)

Books & Papers Recomended

  • Professional NoSQL (Wiley/Wrox. 2011)
  • NoSQL Database Technology (CouchBase. 2011)
  • NoSQL Handbook (Mathias Meyer)
  • No Relation: The Mixed Blessings of Non-Relational Databases (Ian Thomas Varley. 2009)
  • Cassandra: The definitive guide (Even Hewitt. 2010)
  • CouchDB: The Definitive Guide: Time to Relax (J. Chris Anderson, Jan Lehnardt, Noah Slater. 2010)
  • Hadoop in Action (Chuck Lam. 2010)
  • HBase: The Definitive Guide (Lars George. 2011)
  • MongoDB in Action (Kyle Banker. 2010)
  • Beginning SimpleDB (Kevin Marshall, Tyler Freeling. 2009)

Conclusions

NoSQL databases solve problems which born with the global digital data growth, where the DBAs have been dealing with the well-known RDBMS.
Outside of scalability, it really seems that NoSQL databases do not have a killer feature.
Although, I think this is a new opportunity to become a professional on this paradigm.