Index: > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Business Industries Finance Tax

Home > Database normalization


First Prev [ 1 2 3 ] Next Last

Database normalization is a series of steps followed to obtain a database design that allows for consistent storage and efficient access of data in a relational database. These steps reduce data redundancy and the risk of data becoming inconsistent.

However, many relational DBMSs lack sufficient separation between the logical database design and the physical implementation of the data store, such that queries against a fully normalized database often perform poorly. In this case denormalization is sometimes used to improve performance, at the cost of reduced consistency guarantees.

1 Informal Overview

A table in a relational database is said to be in a certain normal form if it satisfies certain constraints. Edgar F. Codd's original work defined three such forms but there are now other generally accepted normal forms. We give here a short informal overview of the most common ones. Each normal form below represents a stronger condition than the previous one (in the order below). For most practical purposes, databases are considered normalized if they adhere to third normal form.

First Normal Form (or 1NF) requires that all column values in a table are atomic (e.g., a number is an atomic value, while a list or a set is not). For example, normalization eliminates repeating groups by putting each into a separate table and connecting them with a primary key- foreign key relationship.
Second Normal Form (or 2NF) requires that there are no non-trivial functional dependencies of a non-key attribute on a part of a candidate key.
Third Normal Form (or 3NF) requires that there are no non-trivial functional dependencies of non-key attributes on something else than a superset of a candidate key.
Boyce-Codd Normal Form (or BCNF) requires that there are no non-trivial functional dependencies of attributes on something else than a superset of a candidate key. At this stage, all attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A).
Fourth Normal Form (or 4NF) requires that there are no non-trivial multi-valued dependencies of attribute sets on something else than a superset of a candidate key.
Fifth Normal Form (or 5NF or PJ/NF) requires that there are no non-trivial join dependencies that do not follow from the key constraints.
Domain-Key Normal Form (or DK/NF) requires that all constraints follow from the domain and the key constraints.

2 Formal Treatment

Before we can talk about normalization we first need to fix some terms from the relational model and define them in set theory. These definitions will sometimes be simplifications of their proper definitions in this model because normalization only concerns certain aspects of the relational model.

Basic notions in the relational model are relation names and attribute names. We will represent these as strings such as "Person" and "name" and we will usually use the variables r, s, t, ... and a, b, c to range over them. Another basic notion is the set of atomic values that contains values such as numbers and strings.

Our first definition concerns the notion of tuple, which formalizes the notion of row or record in a table:

Def. A tuple is a partial function from attribute names to atomic values.
Def. A header is a finite set of attributes names.
Def.- The projection of a tuple t on a finite setThis article is about sets in mathematics. For other meanings, see Set (disambiguation). Sets are one of the most important and fundamental concepts in modern mathematics. Basic set theory, having only been invented at the end of the 19th century, is now of attributes A is t[A] = { (a, v) : (a, v) ∈ t, aA }.

The next definition defines relation which formalizes the contents of a table as it is defined in the relational model.

Def. A relation is a tuple (H, B) with H, the header, a header and B, the body, a set of tuples that all have the domain H.

Such a relationIn mathematics a relation is a generalization of arithmetic relations such as " " and "<" which occur in statements such as "5 < 6" or "2 + 2 4". See relation (mathematics), binary relation and relational algebra. A relational database stores data in rela closely corresponds to what is usually called the extension of a predicate in first-order logic except that here we identify the places in the predicate with attribute names. Usually in the relational model a database schema is said to consist of a set of relation names, the headers that are associated with these names and the constraints that should hold for every instance of the database schema. For normalization we will concentrate on the constraints that hold for individual relations, i.e., the relation constraints. The purpose of these constraints is to describe the relation universeIn mathematics, and particularly in applications to set theory and the foundations of mathematics, a universe or universal class (or if a set, universal set is, roughly speaking, a class that is large enough to contain (in some sense) all of the sets that, i.e., the set of all relations that are allowed to be associated with a certain relation name.

Def. A relation universe U over a header H is a non-empty set of relations with header H.
Def. A relation schema (H, C) consists of a header H and a predicate C(R) that is defined for all relations R with header H.
Def. A relation satisfies the relation schema (H, C) if it has header H and satisfies C.




Non User