Figure:[1]
“Big data expertise increased 89.9% in the last
twelve months”, “Big data, Big pay”, “Data is as important to organisations as
labor and capital” are a few the headlines taken from leading newspapers and
websites which came across recently. No doubt there has been a rapid increase
in the demand of data analysts. Because there is millions of Terabytes of data lying on clouds, local desktop, Government databases, application logs
which can be processed to get lot more
significant information/insights than just suggesting ads.
Figure[2]
In this post, I will discuss some of the
technologies covering from SQL to No-SQL and highlight the key features and
drawbacks of various such technologies. Before touching upon the latest
technologies, I would first like to go back in days when databases were not
very common. Recently, I was watching a series “The Bletchley Circle” which was about 4
female friends who were code-breakers in World War 2 and then move on to live
normal lives after the war. One of these girls had incredible photogenic
memory. During one of the episodes she is given a chart of trains with their
source, destination, arrival and departure timings of each train. She reads the
sheet once, stores the data in her head and later other girls who are good at
strategies or guessing who might be the culprit, use her photogenic memory to
query the data and form a pattern. Analysing pattern or querying the data is
not something which is the need of the hour today, it has been a need realised long back.
When database systems were not very technically advance and when computers were
not enough complex to support such technologies simple text or flat files were
used to store the data.
Later, with the advent of RDBMS or
relational database SQL came in picture. To understand No-SQL it is very
important to understand the architecture of RDBMS first. So, the idea was to
keep the entire data in a table which can be visualised like a matrix. Ex: if we were to store data like user’s FirstName, user’s
LastName, Brand purchased etc, all the information in one matrix or a table, each
row will contain repetition of data. To handle this situation the concept of
foreign key was introduced which if used as an analogy in this example can be
thought as, say we assign a unique ID to each user and then in a different
table we store the info [BrandID, Brand] in TableA and get [firstName,LastName,BrandID]
info is stored in TableB.
However, this simplistic situation is
easy to implement or visualise but as the number of joins between the table
increases so does the time to get the response at the front end. Sometimes, in
a Private Network of a company, if server’s are kept in different countries, by
the time response from DB will come, front-end
may even time out, irrespective of efficient indexing. Hence a need was
felt to avoid the joins.
Another very major concern with RDBMS is its stringent definition of columns. Say while creating the DB if the administrator assumes that the firm would need 10 columns, under no situation can a developer push data with 11 attributes, unless either the DB is altered to add new column(which is a time consuming process if the data is in GBs) or the first table is copied to another DB with extra column value equal to None and then delete old DB.
Another very major concern with RDBMS is its stringent definition of columns. Say while creating the DB if the administrator assumes that the firm would need 10 columns, under no situation can a developer push data with 11 attributes, unless either the DB is altered to add new column(which is a time consuming process if the data is in GBs) or the first table is copied to another DB with extra column value equal to None and then delete old DB.
As it is said that need is mother of
innovation. Therefore to overcome the constraints of SQL databases, no-SQL
technologies were brought in picture.
The major difference between SQL and
No-SQL technologies is that, the latter are Document data model. That is
everything in these technologies is stored in key-value format. Ex: No-SQL DB
like MongoDB will store data like {FirstName:”XYZ”, “LastName:”PQR”} and so on.
The values part itself can be a complex block of key and value in turn.
These No-SQL technologies have added advantage like ex: MongoDB provides automatic “Sharding” that distributes data across a cluster of machines.Moreover, as discussed earlier, No-SQL databases support dynamic schemas. But that does not mean that there is nothing wrong with these technology or a tech team should opt for NoSql by default. Since these technologies are usually open source they lack maturity. These databases were primarily designed for storage and offer very little functionality beyond.[3]
No doubt that NoSql technologies have become really popular with the advent of cloud and it can help to deal with situations where the firm has to deal with unstructured data, it is nevertheless limited in certain areas. So a tech team or an individual must carefully consider the pros and cons associated with these technologies.
References Used:-
[1]http://www.sbp-romania.com/Blog/2014/03/05/sql-vs-nosql.aspx
[2]https://keefcode.wordpress.com/2013/12/04/nosql-databases-how-to-choose/
[3]http://greendatacenterconference.com/blog/the-five-key-advantages-and-disadvantages-of-nosql/
[1]http://www.sbp-romania.com/Blog/2014/03/05/sql-vs-nosql.aspx
[2]https://keefcode.wordpress.com/2013/12/04/nosql-databases-how-to-choose/
[3]http://greendatacenterconference.com/blog/the-five-key-advantages-and-disadvantages-of-nosql/
No comments:
Post a Comment