Aside: Big Data

Big Data refers to (a lot of) data of different types that can be analysed to provide insights.

The three V’s of Big Data are:

  • Volume - The amount
  • Velocity - The rate
  • Variety - The types

Originally - to store data, the common SQL approach was adopted. However when requiring to store datatypes of different and complex types, other standards such as NoSQL (Not-Only SQL) came to arise

Aside: Databases

DBMS - Database Management System
DDL - Data Definition Language [schemas]
DML - Data Manipulation Language [transactions]

Types of Databases:

  • OO Model - OODBMS
  • Relational Model - RDBMS - ie Oracle, MySQL, SQLite
  • DOM Model - ie JSON, XML - ie MongoDB

Relational Databases

Terms

Each group of data is called a record
Each attribute in the record is known as a field

A Primary Key is a unique identifier
A Foreign Key is a field that references a primary key of another table

Constraints

Constraints protect the data being inserted into a table, to prevent invalid data from being stored

Domain Constraint

Values must match their type, must be present, etc

Referential Integrity Constraint

“A restraint on the integrity of the reference”

When a record is created, if a foreign key field is present, the value of that field must exist as a primary key of the referenced table.
(X must exist as a primary key in a table if another table)

Key Constraint

No two records can have the same value for a specific field unless they are NULL

Primary Key Constraint

No two records can have the same value for a specific field, and NULL is not permissible

Effective Database Design Principles

Multi-valued Attributes

In SQL databases, there exists no such multi-value attribute type, which makes holding arrays of data slightly more difficult.
(Other databases like MongoDB and NoSQL have multi-value attributes, so crisis averted!)

We might originally have a table like

IDNameHobby 1Hobby 2Hobby 3Hobby 4
1AndrewSportsPhotographyProgrammingMusic

But this is bad as it will use alot of data for each record. Consider if someone else only had one hobby, their Hobby 2, 3, 4, … would be empty - but space will still be reserved for those fields!

Or something like

IDNameHobby
1AndrewSports,Photography,Programming,Music

This is slightly better, but still not the best, as data of the field Hobby needs to be parsed outside of the database.
Consequently, we are unable to search for people who like “Sports” by using just the database

A better approach would be to create a second table (many-to-many), which links each person with their hobbies

IDName
1Andrew
IDHobby
1Sports
1Photography
1Programming
1Music