Posted on September 17, 2012
Differences between relational database model and NoSQL database models are vast – NoSQL is a set of technologies that addressing problems that begin to plague Codd's relational model for very large systems, and they have a lot of drawbacks, but also some very important advantages. Cassandra is selected as very robust, performant and decentralized system that I've had the opportunity to work on multiple projects. It's not the only solution, but being well documented and with strong and helpful community, it is one of the best options.
Cassandra is a combination of two big-data technologies, Dynamo and Google's BigTable, open sourced by Facebook in 2008. Cassandra is currently under very active development, and it can be downloaded from Apache Cassandra website.
In relational model, database is the outer layer. Database contains tables, and each table contains one or more named columns. New record (row) is defined by providing values for all defined columns; if value doesn't exist, null value is used. Records can be accessed if row row unique identifier (primary key) is known, or by using SQL query language for retrieving rows that satisfies certain criteria.
What’s wrong with RDBMS? Development of RDBMS didn’t follow IT expansion; nowadays we have huge systems absorbing daily huge amounts of data. Doing so with technology from 1970s can’t be effective, because of lack of scalability and degrading performances with increasing amount of data. Also, world is not ideal, and hardware fails, so system need to be fault-tolerant, scalable, without single-point of failure.
Cassandra is solving the problem of distributed and scalable systems, and it’s built to cope with data management challenges of modern business.
Cassandra is decentralized system - There is no single point of failure, if minimum required setup for cluster is present - every node in the cluster has the same role, and every node can service any request. Replication strategies can be configured. It is possible to add new nodes to server cluster very easy. Also, if one node fails, data can be retrieved from some of the other nodes (redundancy can be tuned). It is especially suitable for multiple data-center deployment, redundancy, failover and disaster recovery, with possibility of replication across multiple data centers.
This level of flexibility has it’s price.
NoSQL database models won't and can't completely replace RDBMS technology, but importance of NoSQL will grow because of scale, flexibility and ease-of-use. We are dealing with more and more of data; we want durable and fault-tolerant applications; we want apps that scale and apps that are fast. Because all of these, NoSQL will be around us more and more, and it's definitely technology worth exploring.