Code Shiksha: Invincible World of Data

Figure:[1]

“Big data expertise increased 89.9% in the last twelve months”, “Big data, Big pay”, “Data is as important to organisations as labor and capital” are a few the headlines taken from leading newspapers and websites which came across recently. No doubt there has been a rapid increase in the demand of data analysts. Because there is millions of Terabytes of data lying on clouds, local desktop, Government databases, application logs which can be processed to get lot more significant information/insights than just suggesting ads.

Figure[2]

In this post, I will discuss some of the technologies covering from SQL to No-SQL and highlight the key features and drawbacks of various such technologies. Before touching upon the latest technologies, I would first like to go back in days when databases were not very common. Recently, I was watching a series “The Bletchley Circle” which was about 4 female friends who were code-breakers in World War 2 and then move on to live normal lives after the war. One of these girls had incredible photogenic memory. During one of the episodes she is given a chart of trains with their source, destination, arrival and departure timings of each train. She reads the sheet once, stores the data in her head and later other girls who are good at strategies or guessing who might be the culprit, use her photogenic memory to query the data and form a pattern. Analysing pattern or querying the data is not something which is the need of the hour today, it has been a need realised long back. When database systems were not very technically advance and when computers were not enough complex to support such technologies simple text or flat files were used to store the data.

Later, with the advent of RDBMS or relational database SQL came in picture. To understand No-SQL it is very important to understand the architecture of RDBMS first. So, the idea was to keep the entire data in a table which can be visualised like a matrix. Ex: if we were to store data like user’s FirstName, user’s LastName, Brand purchased etc, all the information in one matrix or a table, each row will contain repetition of data. To handle this situation the concept of foreign key was introduced which if used as an analogy in this example can be thought as, say we assign a unique ID to each user and then in a different table we store the info [BrandID, Brand] in TableA and get [firstName,LastName,BrandID] info is stored in TableB.

However, this simplistic situation is easy to implement or visualise but as the number of joins between the table increases so does the time to get the response at the front end. Sometimes, in a Private Network of a company, if server’s are kept in different countries, by the time response from DB will come, front-end may even time out, irrespective of efficient indexing. Hence a need was felt to avoid the joins.
Another very major concern with RDBMS is its stringent definition of columns. Say while creating the DB if the administrator assumes that the firm would need 10 columns, under no situation can a developer push data with 11 attributes, unless either the DB is altered to add new column(which is a time consuming process if the data is in GBs) or the first table is copied to another DB with extra column value equal to None and then delete old DB.

As it is said that need is mother of innovation. Therefore to overcome the constraints of SQL databases, no-SQL technologies were brought in picture.

The major difference between SQL and No-SQL technologies is that, the latter are Document data model. That is everything in these technologies is stored in key-value format. Ex: No-SQL DB like MongoDB will store data like {FirstName:”XYZ”, “LastName:”PQR”} and so on. The values part itself can be a complex block of key and value in turn.

These No-SQL technologies have added advantage like ex: MongoDB provides automatic “Sharding” that distributes data across a cluster of machines.Moreover, as discussed earlier, No-SQL databases support dynamic schemas. But that does not mean that there is nothing wrong with these technology or a tech team should opt for NoSql by default. Since these technologies are usually open source they lack maturity. These databases were primarily designed for storage and offer very little functionality beyond.[3]

No doubt that NoSql technologies have become really popular with the advent of cloud and it can help to deal with situations where the firm has to deal with unstructured data, it is nevertheless limited in certain areas. So a tech team or an individual must carefully consider the pros and cons associated with these technologies.

References Used:-
[1]http://www.sbp-romania.com/Blog/2014/03/05/sql-vs-nosql.aspx
[2]https://keefcode.wordpress.com/2013/12/04/nosql-databases-how-to-choose/
[3]http://greendatacenterconference.com/blog/the-five-key-advantages-and-disadvantages-of-nosql/

Code Shiksha

Tuesday, February 10, 2015

Invincible World of Data

No comments:

Post a Comment

Blog Archive