Data modelling: an important early stage of database application development (aka "database engineering").
Aim of data modelling:
Data modelling is analogous to the design phase of software engineering, but deals only with data structures.
Inputs to data modelling:
Data models are either:
Object-based data models
Formal mappings exist between the different kinds of models.
A useful strategy:
The initial elements of the database design process:
Consider the following while doing data modelling:
Consider the database system for a savings/loan bank ...
What kind of information does it need to maintain?
In general, there is no single "best" design for a given application.
Also, limitations of modelling framework may force compromises.
However, we may describe a proposed design as "inadequate" if it:
Customer
Account
Initial requirements statements are typically vague
The world is viewed as a collection of inter-related objects.
Examples: a person, a car, a bank account, ...
Each object is described by:
A class is a set of objects with similar properties and methods.
Examples: all students (at UNSW), all cars (in Australia), ...
All student objects have the same data and operations,
e.g. student.name
student.graduates(2003)
An alternative view of classes
ODL is a proposed standard for object-oriented data modelling:
An ODL definition consists of a collection of classes.
ODL describes classes using three kinds of properties:
Class declarations have the following syntax:
interface ClassName : ParentClass { List of Properties }
Example:
interface Person { attribute string name; attribute integer height; attribute Enum Gender gender; attribute Struct Date birthday; }; interface Employee : Person { attribute float salary; relationship Company WorksFor inverse Company::Employs; };
Attributes describe the data values associated with an object.
Each attribute is defined by a type and a name.
ODL defines a variety of atomic, user-defined, and structured data types.
Attributes can have any type except types involving class types.
Names have global scope, and are used in context of object (e.g. p.name
Attributes are inherited by sub-classes.
Atomic built-in data types:
enum
Enum Gender {male, female} Enum AcctType {savings, cheque, credit} Enum Colour {red, orange, yellow, green, blue}
Structured data types are built using a set of type constructors:
Set< > |
unordered collections of distinct elements of type T | |
Bag< > |
unordered collections of elements of type T | |
List< > |
sequences of elements of type T | |
Array< > |
fixed-length vectors of i elements of type T |
The base types T's can be arbitrary types, including class types.
A structured type for an attribute may not contain any class types.
These structured types are ODL's collection types (important for relationships).
Differences between Sets, Bags, Lists:
{1,2,3}={1,2,3} | {1,2,3}={3,2,1} | {1,1,2}={1,2,1} | {1,1,2}={1,2} | |||||
Yes | Yes | Yes | Yes | |||||
Yes | Yes | Yes | No | |||||
Yes | No | No | No |
Arrays and Lists are similar (e.g. could view Array as fixed-size List)
Examples of structured type definitions:
// Can be used as types for attributes List<real> Array<string, 20> Set<integer> Bag<Enum Colour> List<Array<Set<integer>, 10>>> // Cannot be used as types for attributes List<People> Set<Company> List<Set<People>>>
One other structured data type builds tuples (cf. C's struct
Struct Name { Type1 FieldName1, Type2 FieldName2, ... Typen FieldNamen, }
Examples:
Struct Date {integer day, integer month, integer year} Struct Name {string family, string given} Struct Address {integer number, string street, ...}
Relationships describe connections between objects (classes).
A relationship is defined in a class C by:
relationship Collection(D) RelName inverse D::InverseRelName
Example:
- employees work in a company; may work on many projects
- each project requires several employees to get it completed
interface Employee { ... relationship Company WorksFor; relationship Set<Project> WorksOn; }; interface Company { ... relationship Set<Employee> Employs; }; interface Project { ... relationship Set<Employee> Involves; };
Example (with inverses included):
interface Employee { ... relationship Company WorksFor inverse Company::Employs; relationship Set<Project> WorksOn inverse Project::Involves; }; interface Company { ... relationship Set<Employee> Employs inverse Employee::WorksFor; }; interface Project { ... relationship Set<Employee> Involves inverse Employee::WorksOn; };
Example (family trees):
interface Person { attribute string name; attribute Date birthday; ... relationship Person FatherIs inverse Person::FatherOf; relationship Set<Person> FatherOf inverse Person::FatherIs; ... relationship Set<Person> Children inverse Person::BiologicalParent; relationship Array<Person,2> BiologicalParents inverse Person::Children; ... relationship Set<Person> Raised inverse Person::RaisedBy; relationship Set<Person> RaisedBy inverse Person::Raised; ... relationship Person BestFriend inverse Person::BestFriend; };
Notes:
An important property of relationships:
FatherIs
BiologicalParents
Children
Consider classes C, D, and relationships R, R-1 between them.
The most common relationship multiplicities:
C → R → D is a set of objects D → R-1 → C is a set of objects |
||
C → R → D is a single object D → R-1 → C is a set of objects |
||
C → R → D is a single object D → R-1 → C is a single object |
Kinds of objects:
Struct Name { string family, string given }; Struct Address { integer number; string street; string suburb; integer postcode; }; Enum Activity { deposit, withdrawal }; Enum AcctType { savings, cheque, credit };
Modelling people:
interface Person { attribute Struct Name name; attribute Struct Address address; }; interface Employee : Person { attribute int employeeNo; attribute float salary; relationship Branch WorskAt inverse Branch::Staff; }; interface Manager : Employee { relationship Branch Manages inverse Branch::ManagedBy; };
Modelling branches:
interface Branch { attribute integer branchNo; attribute string name; attribute Struct Address address; attribute float assets; ... relationship Set<Customer> Customers inverse Customer::HomeBranch; relationship Set<Account> Accounts inverse Account::HeldAt; };
Modelling accounts:
interface Account { attribute integer accountNo; attribute float balance; ... relationship Branch HeldAt inverse Branch:Accounts; relationship Set<Customer> Owners inverse Customers::Accounts; relationship List<Transaction> Transactions inverse Transaction::TransAcct; };
Modelling transactions:
interface Transaction { attribute Enum Activity type; attribute Date when; attribute float amount; ... relationship Account TransAcct inverse Account::Transactions; relationship Customer MadeBy inverse Customer::Transactions; };
Modelling people (cont):
interface Customer : Person { attribute int customerNo; attribute Date memberFrom; ... relationship Branch HomeBranch inverse Branch::Customers; relationship Set<Account> Holds inverse Account::HeldBy; relationship List<Transaction> Transactions inverse Transaction::MadeBy; };
The world is viewed as a collection of inter-related entities.
Goal of ER modelling:
Example of relation: John Shepherd owns his car
Each entity is described by:
Analogy between ER and OO models:
The entity-relationship model has existed for almost 30 years.
(Original description: Chen, ACM Transactions on Database Systems, 1(1), 1976)
It was never standardised, but has been well-used ...
Other texts (EN, GUW, RG) use slightly different notation.
Choose whichever one you like, and use it consistenly.
ER diagrams are a graphical tool for data modelling.
An ER diagram consists of:
Specific visual symbols indicate different ER design elements:
Example ER diagram (details explained below):
Each attribute in an ER diagram:
Simple attributes are drawn from atomic value domains.
(cf. ODL's atomic data types)
Composite attributes consist of a hierarchy of attributes.
(cf. ODL's Struct
Single-valued attributes have one value for each entity.
(cf. ODL's atomic or Struct
Multivalued attributes have a set of values for each entity.
(cf. ODL's collection types)
Null attributes may contain a distinguished null
(e.g. a person's age may be computed given their date of birth)
Example of attribute notations:
An entity set can be viewed as either:
(e.g. a person may be both a Customer
Employee
The "raw data" in a database may be viewed as a collection of entity sets.
ER model does not have a notion of OIDs, so how to distinguish entities?
What if two entities have the same set of attribute values?
They're regarded as the same entity.
So, each entity must have a distinct set of attribute values.
Implications:
One approach to ensuring that attribute value sets are distinct:
This approach is used commonly in practice e.g.
A superkey is
The notion of superkey is a property of the whole entity set (extension).
During design, we typically consider keys relative to all possible extensions.
Example (bank customer entities):
Customer=(custNo,name,address,taxFileNo)
Definite superkeys:
custNo
taxFileNo
(name,address)
(name)
(address)
The entire set of attributes is always a superkey.
However, most entity sets have several superkeys.
E.g. (custNo,name)
(custNo,address)
It would be convenient to have just one key to identify entities.
Can we identify a minimal set of attributes to be the key?
In examples with artificial identifiers (e.g. SSN), use identifier.
In other examples ... ?
A candidate key is any superkey such that
(custNo)
(taxFileNo)
(name,address)
A primary key:
Keys are indicated in ER diagrams by underlining the key attributes.
A relationship is an association among several entities.
E.g. Customer(9876) is the owner of Account(12345)
A relationship set is a collection of relationships of the same type.
E.g. the "is the owner of" relationship set
Customer(9876) is the owner of Account(12345)
Customer(9426) is the owner of Account(54321)
Customer(9511) is the owner of Account(88888)
Customer(9303) is the owner of Account(78787)
etc. etc.
Stated more formally:
In an ER diagram, relationship sets are assigned names.
The role of each entity in the relationship is usually implicit.
Roles can be explicitly named if needed (in red below)
Examples:
Mapping cardinalities describe the number of entities that a given entity can be associated with via a relationship.
A binary relationship set R between entity sets A and B may be (assuming a ∈ A, b ∈ B):
each a is associated with at most one b each b is associated with at most one a |
||
each a is associated with zero or more b each b is associated with at most one a |
||
each a is associated with at most one b each b is associated with zero or more a |
||
each a is associated with zero or more b each b is associated with zero or more a |
Examples:
An arrow from A to B indicates that there is at most one b ∈ B for each a ∈ A.
An alternative notation makes cardinalities explicit:
Level of participation is another type of relationship constraint.
Participation in relationship set R by entity set A may be:
every a ∈ A participates in ≥1 relationship in R | ||
only some a ∈ A participate in relationships in R |
Example:
In ER diagrams:
In the x-to-one and one-to-x relationships above, we noted that e.g.
Sometimes, we require that there must be exactly one associated entity.
E.g. a manager must have a branch to manage, but a branch may (temporarily) have no manager.
This is known as a referential integrity constraint.
In some cases, a relationship needs associated attributes.
Example:
(Price and quantity are related to products in a particular store)
Weak entities exist only because of association with other entities.
Examples:
While weak entities do not have a primary key
E.g. employees/family, with name as a "key" ...
Employees = {Anne, John, Liam} Families = {Alice, Bob, David, Jane, Mary, Tim, Tim} John's family = {Alice, Bob, Mary, Tim} Anne's family = {David, Tim} Liam's family = {Jane}
E.g. loans/payments, with loan ids and payment sequence ...
Loan = {(1234,John,100000),(4321,Arun,70000)} Payments = {(1,1500),(2,1500),(3,1500),(1,750),(2,1500)} John's payments = {(1,1500),(2,1500),(3,1500)} Arun's payments = {(1,750),(2,1500)}
In ER diagrams:
Extensions to the "standard" ER model include inheritance.
A subclass of an entity set A is a set of entities:
Some texts use the term lower level entity set as synonym for "subclass".
If an entity set has multiple subclasses, they may be:
ISA
'd'
'o'
Example of subclasses:
The ER model provides a powerful, general set of data modelling tools.
Some considerations in using these tools to create a design:
Attribute vs Entity Example (v1)
Employees can work for several departments, but cannot work for the same department over two different time periods.
Attribute vs Entity Example (v2)
Employees can work for the same department over two different time periods.
Design is initially somewhat fluid; no single "correct" answer.
Some design refinements:
Other texts discuss design in more detail (e.g. Conolly/Begg)
To some extent, it's a case of "learn by practising"
Kinds of information in model:
ER diagrams are typically too large to fit on a single screen.
(or a single sheet of paper, if printing)
One commonly used strategy:
Modelling people:
Modelling people (cont):
Modelling branches, accounts, transactions:
Putting it all together with relationships ...
There are some design aspects that ER does not deal with:
Conceptual design (data modelling) follows requirements analysis.
ER model is popular for doing coneptual design
Additional constructs: weak entities, ISA hierarchies
Many notational variants of ER exist
(especially in the expression of constraints on relationships)
Produced: 13 Sep 2020