Database Design


Fundamental Principles of Database Design

This article gives a moderate level of detail about database design principles. If you need a deep dive, go take a class.


Databases

So what exactly is a database? Quickly defined it is a technology product that stores data. Databases are the primary data storage mechanism for most technology products you use every day.

They store the data for your bank, your airline and hotel reservations to the settings on your preschooler's favorite app.

Use Cases

The most common use cases for databases are the following.

  • Websites
  • SaaS Applications
  • Mobile Apps
  • Data Warehouses
  • Machine Learning Models and AI(Artificial Intelligence).

Types of Databases

There are two primary flavors of databases. They are relational and NoSQL.

Relational

When most people outside of the tech world think of a database, they are thinking of a relational database. Relational databases do exactly what they sound like, they utilize relationships between the data.

Relational databases are highly organized with a set structure containing rows and columns spread across multiple tables. The relationships are established using keys. The primary key is the parent in the relationship and foreign key is the child in relationship.

With relationships you can do things like find all of a customer's orders, where the customer and the orders are stored in separate tables.

There are 3 types of relationships:

  • 1:1 (read as 1 to 1) - A 1:1 relationship could be settings for a user.
  • 1:many (aka 1:m) - A 1:many relationship could be orders for a customer.
  • Many:many - A many:many relationship could be classes that have many students and student enrolled in many classes. Many:many relationships typically use a bridge table that has a 1:many relationships on both sides, thus bridging the tables together.

NoSQL

NoSQL stands for Not Only SQL. SQL is the database query language which is short for Structured Query Language. It can be pronounced phonetically like "sequel", my personal preference, or alphabetically as "S-Q-L".

NoSQL databases are not structured with rows and columns. They are used heavily in the world of Big Data to hold data that doesn't have a good structure to it.

Design Philosophies

There are two common of design philosophies.

  • Online Transactional Processing or OLTP, is used for applications and E-Commerce. It is optimized to reduce redundancy and is concerned with transaction processing.
  • Online Analytical Processing or OLAP is designed for analytics. It is used in data warehouses and is highly aggregated, but doesn’t update data.

Schemas

Two primary types of schemas exist. These are star and snowflake.

  • Star schema is more generalized and uses less normalization (fewer relationships).
  • Snowflake schema is more normalized and uses more joins (process to connect tables in a query).

Optimization

Many methods exist to optimize database performance. It really depends on the project specifics on how you would do it. For example, is your database for an E-Commerce application or a Data Warehouse for analytics? Here are some of the common methods:

  • Normalization uses relationships to cut down on redundancy. It uses a join to establish the relationship during query(matching foreign keys with primary keys). Normalization is more common in the OLTP design philosophy and snowflake schema.
  • Field types control what type of data can be entered into a field. For example, the decimal field type won't accept "abc", it will throw an error.
  • Constraints add some sort of control to a field. It may be a default value or weather the field can be null (no value) or not.
  • Indexes increase speed or queries. It's like the ordering of the old phone book back in the day. If you are looking for "Thompson" and the page you open to is on the letter "M" you need to keep going.

An Empty Value

So why would you want a null (empty) value, shouldn't everything have a value? Well, going back to the school example, you many have a field for graduation data. Well if a student is in 10th grade they haven't graduated yet and the value should be null.