This ZL Tech Connect entry is part two of a two-part blog post discussing SQL and NoSQL. Click here for part one.
So, where were we? Right, NoSQL.
Flashback to college days.
Your college keeps a record of each of its students. If they were just to keep basic information on each student, they might create a table like this:
|Student ID||First Name||Last Name||Grad Year|
Table 3: Student Information
If only college were that simple.
But actually, colleges collect all types of information about a student’s academic career, including the organizations they are involved in, the classes they take, their athletic career, their academic advisors, their housing information, and so much more.
Let’s just take a small piece of the pie and try to record general student information and their athletic participation in a table:
|Student ID||First Name||Last Name||Grad Year||Football||Gymnastics||Swimming||Track and Field|
Table 4: Student and Athletic Information
This table doesn’t seem too awful. But colleges don’t just have three students or just four sports.
What about student organizations? Let’s try to record that along with general student information:
|Student ID||First Name||Last Name||Grad Year||Big Brother Tutoring||De Vinimus||Christian Students Organization||Negotiation Association||Off the Beat Acapella|
Table 5: Student and Organizations Information
Again, we won’t only have three students and five student clubs. And as new organizations are born each year, the number of columns will continue to increase. This data is evolving and indeterminate.
The tables are progressively getting more and more difficult to keep track of. (Look back at (link to first post) Table 1: Book Table to see the comparison!) This is due to the dataset – this type of information is not suitable for a SQL database.
Is there a better way to store this information?
You probably already know where I’m going with this.
NoSQL is an alternative. So, how does NoSQL store data? Rather than through tables, NoSQL stores data in a way that can be comparable to a deck of flashcards (more flashbacks to college days?).
An example of how NoSQL would store data about Kristina Harris:
|athletic_org: Gymnastics, Track and Field|
|student_org: De Vinimius, Off the Beat Acapella|
We can easily add information as we go, such as her major, her class list for Spring 2017, her on-campus address, and more. Much easier!
And any changes to her “flashcard” are independent, meaning they don’t affect those of John or Michael or any other student. Like flashcards in a deck, each student’s information doesn’t have to relate to the other.
NoSQL databases are more like the car. They aren’t as rigid as the train – they can stop anywhere on their journey and pick up (or drop off) new people. They can go off track and create their own path. They operate independent of each other, unlike trains that have to communicate with each other.
However, with its advantages, NoSQL has its drawbacks too. NoSQL is more flexible and forgiving, but being able to store and add any data anywhere, anytime can lead to consistency issues.
Compare and Contrast
The biggest controversy is that NoSQL is quoted to be faster than SQL. Not surprising. Going back to New Table 1, if we needed to find the publisher contact information to make purchases of the “NoSQL for Mere Mortals” book, we would have to run a query (aka write a code) that looks for the “NoSQL for Mere Mortals” book, then looks for the Publisher ID, then looks at the Publisher Table (i.e. Table 2), then retrieves the Publisher Name, Contact Name and Contact Email.
Phew, gotta catch my breath.
The query then returns the Book Title, Publisher Name, Contact Name and Contact Email.
Comparatively, if we want to find which athletic and student organizations Kristina is involved in, our query would look for Kristina’s “card” and pull her athletic and organization information.
See how that can be faster?
Data grows. Our bookstore will acquire more books; colleges will accept more students.
As our book collection increases, we will have to add more and more rows to New Table 1. The data cannot be partitioned, as in split up amongst multiple databases, because the tables are relational (there’s that word again). Therefore scaling in SQL is done by adding more power to the server. This is called vertical scaling. Vertical scaling is therefore limited to the capacity of a single machine. Scaling beyond that capacity involves downtime (as we upgrade and move from a small server to a bigger one) and still comes with an upper limit.
Whereas an increase in college students is like adding more flashcards to the deck. The data is non-relational and so, the database can be partitioned. We can add more machines into the existing pool, rather than adding more power to a single server. This is called horizontal scaling. Horizontal scaling is easier, since it essentially means just adding more servers to the mix.
For these and other reasons, the war continues between SQL vs. NoSQL.
Summary – because you made it this far
SQL and NoSQL do the same thing: store data. Just in different ways. It’s possible to choose one and switch to the other, but a little planning now can save you a lot of time and money later.
Projects where SQL is more ideal:
- logical, related, discrete data requirements which can be identified upfront
- data integrity is essential
- standards-based proven technology with good developer experience and support
- Examples: medical records, employee information.
Projects where NoSQL is more ideal:
- unrelated, indeterminate or evolving data requirements
- simpler or looser project objectives, able to start coding immediately
- speed and scalability is imperative
- Examples: social media (Facebook created and employs a NoSQL database, called Apache Cassandra)
We’ve only dipped our toes in the water here – there’s so much more to learn about SQL vs. NoSQL! And now that you know the difference between the two (remember, think train vs. car) you can sound just a little bit smarter when chatting with engineers around the water cooler.