Relational Database Design: ER Modeling, Normalization, Storage Engine, Charset, Data Types, and Index Design
The article provides a comprehensive guide to relational database design using MySQL, covering ER modeling, normalization (BCNF and 3NF), storage engine selection, character set and data type choices, as well as index design principles illustrated with a student course selection example.
Abstract
This article discusses relational database design, covering the relational model, table structure design, and using a student course selection example to illustrate the design process while simplifying content for clarity.
It is based on MySQL and is suitable for readers with some relational database fundamentals.
Entity-Relationship Model (E‑R)
First, understand what an E‑R data model is and its purpose.
The E‑R model is useful for mapping real‑world facts and relationships to a conceptual schema; many database design tools rely on it. The three main concepts are entity sets, relationship sets, and attributes.
Entity : an object or event distinguishable from others, e.g., each student or each course in a school.
Attribute : descriptive property of each entity member, e.g., a student's name or ID.
Entity Set : collection of entities of the same type and attributes, e.g., all students or all courses.
Relationship : association between multiple entities, e.g., a student selects a course.
Relationship Set : collection of similar relationships, e.g., all student‑course selections.
Now let's draw an E‑R diagram for the student‑course selection scenario.
In the diagram, (student_id, name, age, gender) are attributes of Student; (grade) is an attribute of the selection relationship; (course_id, course_name, credit) are attributes of Course. The relationship between Student and Course is many‑to‑many.
Relation Table Design
From the E‑R diagram we can see the connections; how should we design the relational schema?
The goal of relational database design is to produce a set of relation schemas that avoid unnecessary redundancy while allowing easy data retrieval. Normalization helps achieve this.
Boyce‑Codd Normal Form
One satisfactory normal form is Boyce‑Codd Normal Form (BCNF). For every functional dependency α→β in F⁺ where α⊆R and β⊆R, at least one of the following holds:
α→β is a trivial functional dependency (β ⊂ α).
α is a superkey of schema R.
Consider the following schemas and their functional dependencies:
Student = (student_id, name, age, gender) with student_id → name age gender
Course = (course_id, course_name, credit) with course_id → course_name credit
Enrollment = (student_id, course_id, grade) with student_id course_id → grade
All these schemas satisfy BCNF because the left side of each non‑trivial dependency is a candidate key.
However, not every BCNF preserves all functional dependencies. For example, the Banker schema (branch_name, customer_name, banker_name) has functional dependencies banker_name → branch_name and branch_name customer_name → banker_name. Since banker_name is not a superkey, the schema is not in BCNF. Decomposing it into Banker‑branch (banker_name, branch_name) and Customer‑banker (customer_name, banker_name) yields BCNF, though the second dependency is not preserved.
Third Normal Form
When we cannot simultaneously satisfy BCNF, lossless join, and dependency preservation, we may accept the weaker Third Normal Form (3NF), which always allows a lossless, dependency‑preserving decomposition.
A relation R with functional dependencies F is in 3NF if for every α→β in F⁺, at least one of the following holds:
α→β is trivial.
α is a superkey of R.
Each attribute in β‑α is part of a candidate key.
Returning to the Banker example, the schema is in 3NF because the only candidate key is {branch_name, customer_name}, and the non‑trivial dependency banker_name → branch_name has its right‑hand side attribute (branch_name) belonging to a candidate key.
Every BCNF is also a 3NF because BCNF's constraints are stricter.
Storage Engine Selection
Once the relational schema is determined, the detailed table design begins, starting with the storage engine. MySQL provides various engines for different use cases.
The table below summarizes some MySQL storage engines.
The two most commonly used engines are MyISAM and InnoDB.
MyISAM: default engine before MySQL 5.5.5.
InnoDB: default engine from MySQL 5.5.5 onward.
How to Choose?
Selection criteria: choose the engine that matches application characteristics; complex systems may combine multiple engines, but be aware of drawbacks such as backup complexity.
Using only InnoDB simplifies backup and recovery; MySQL Enterprise Backup can perform hot backups of InnoDB tables.
Typical use cases:
InnoDB: transactional workloads.
MyISAM: most non‑transactional workloads.
Memory: data stored in memory for ultra‑fast access (Redis or other NoSQL may be alternatives).
Character Set Selection
After choosing the storage engine, the character set must be determined. Selecting the wrong charset can incur high costs when changing later.
How to Choose?
Prefer the smallest charset that fully meets current and near‑future application needs to save space, reduce network traffic, and improve performance.
Databases support charset at server, database, table, and column levels; using database‑ or table‑level charset is recommended for consistency.
Data Type Selection
Selection Principles
Prerequisite: use an appropriate storage engine.
Principle: always try to use the most precise type for optimal storage.
Fixed‑Length vs Variable‑Length
char vs varchar
The following example illustrates the difference:
For InnoDB tables, there is no distinction between fixed‑length and variable‑length columns at the row format level; all rows use pointers to column values.
Since storage size is the main performance factor, CHAR always occupies at least as much space as VARCHAR, so VARCHAR is preferred to minimize row size and disk I/O.
text and blob
When using TEXT or BLOB columns, keep in mind:
Frequent deletes/updates leave holes; run OPTIMIZE TABLE regularly.
Avoid querying large TEXT/BLOB values as they increase I/O pressure.
Separate TEXT/BLOB data into their own tables to keep the main table rows short.
Floating‑Point and Fixed‑Point Numbers
In MySQL, FLOAT and DOUBLE are floating‑point; DECIMAL is fixed‑point.
Floating‑point numbers have a larger range for a given length but suffer from precision issues; avoid using them for monetary values.
Integers
MySQL supports standard integer types INTEGER (or INT) and SMALLINT, plus extensions TINYINT, MEDIUMINT, and BIGINT. The table below shows storage requirements and ranges.
Index Design
Design Principles
Index columns should appear in WHERE clauses or join conditions, not necessarily in SELECT lists.
Use unique indexes where possible; they are most effective.
Use short indexes; for long string columns, index a prefix.
Leverage leftmost prefix of composite indexes.
Avoid over‑indexing.
Consider the type of comparisons; indexing columns used in functions is ineffective.
Example
Based on the student‑course selection E‑R diagram, the resulting tables are:
Table 1‑1 Student Information (Student)
Table 1‑2 Course Information (Course)
Table 1‑3 Enrollment and Grade (SC)
Student name length is 40 to accommodate foreign names.
Gender is defined as an ENUM for simplicity.
Store birthdate instead of age so the age can be calculated dynamically.
Grade uses DOUBLE rather than DECIMAL because high precision is unnecessary.
(student_id, course_id) is a composite primary key.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.