Data modelling and OO/ER data models


∮ Data Modelling


Data Modelling

Data modelling: an important early stage of database application development (aka "database engineering").

  1. requirements analysis   (identify data and operations)
  2. data modelling   (high-level, abstract)
  3. database schema design   (detailed, relational/tables)
  4. database physical design   (based on expected workload)
  5. database implementation   (create instance of schema)
  6. build operations/interface   (SQL, stored procedures, GUI)
  7. performance tuning   (physical re-design)
  8. schema evolution   (logical schema re-design)

Data Modelling (cont)

Aim of data modelling:

Data modelling is analogous to the design phase of software engineering, but deals only with data structures.


Data Modelling (cont)

Inputs to data modelling:

Outputs from data modelling: Many languages/methodologies have been developed to assist, e.g.

Data Models

Data models are either:

Two main groups of logical data models:

Object-based vs. Record-based

Object-based data models

Record-based data models

Object-based vs. Record-based (cont)

Formal mappings exist between the different kinds of models.

A useful strategy:

We adopt this (very common) strategy in this course.

Database Design

The initial elements of the database design process:

[Diagram:Pic/er-rel/process.png]


Some Design Ideas

Consider the following while doing data modelling:


Case Study: Banking System

Consider the database system for a savings/loan bank ...

What kind of information does it need to maintain?

How are all of these data items related? We illustrate data models by:

Correctness of Designs

In general, there is no single "best" design for a given application.

Also, limitations of modelling framework may force compromises.

However, we may describe a proposed design as "inadequate" if it:


Correctness of Designs (cont)

Initial requirements statements are typically vague

Most important aspects of design documents:

∮ Object-Oriented Data Modelling


Object-oriented Data Modelling

The world is viewed as a collection of inter-related objects.

Examples:   a person,   a car,   a bank account, ...

Each object is described by:


Object-oriented Data Modelling (cont)

A class is a set of objects with similar properties and methods.

Examples:   all students (at UNSW),   all cars (in Australia), ...

All student objects have the same data and operations,

e.g. student.name,   student.graduates(2003)

An alternative view of classes


Object Definition Language (ODL)

ODL is a proposed standard for object-oriented data modelling:

For object-oriented database management systems:

ODL Definitions

An ODL definition consists of a collection of classes.

ODL describes classes using three kinds of properties:

Since we are concerned with data modelling here, we ignore methods.

Classes

Class declarations have the following syntax:

interface ClassName : ParentClass { List of Properties }

Example:

interface Person {
	attribute string  name;
	attribute integer height;
	attribute Enum Gender gender;
	attribute Struct Date birthday;
};

interface Employee : Person {
	attribute float   salary;
	relationship Company WorksFor
	     inverse Company::Employs;
};


Attributes

Attributes describe the data values associated with an object.

Each attribute is defined by a type and a name.

ODL defines a variety of atomic, user-defined, and structured data types.

Attributes can have any type except types involving class types.

Names have global scope, and are used in context of object (e.g. p.name).

Attributes are inherited by sub-classes.


ODL Types

Atomic built-in data types:

Atomic user-defined data types: Examples:

Enum Gender   {male, female}
Enum AcctType {savings, cheque, credit}
Enum Colour   {red, orange, yellow, green, blue}


ODL Types (cont)

Structured data types are built using a set of type constructors:

Set<T> unordered collections of distinct elements of type T
Bag<T> unordered collections of elements of type T
List<T> sequences of elements of type T
Array<T,i> fixed-length vectors of i elements of type T

The base types T's can be arbitrary types, including class types.

A structured type for an attribute may not contain any class types.

These structured types are ODL's collection types   (important for relationships).


ODL Types (cont)

Differences between Sets, Bags, Lists:

  {1,2,3}={1,2,3} {1,2,3}={3,2,1} {1,1,2}={1,2,1} {1,1,2}={1,2}
Set Yes Yes Yes Yes
Bag Yes Yes Yes No
List Yes No No No

Arrays and Lists are similar   (e.g. could view Array as fixed-size List)


ODL Types (cont)

Examples of structured type definitions:

// Can be used as types for attributes

List<real>
Array<string, 20>
Set<integer>
Bag<Enum Colour>
List<Array<Set<integer>, 10>>>

// Cannot be used as types for attributes

List<People>
Set<Company>
List<Set<People>>>


ODL Types (cont)

One other structured data type builds tuples   (cf. C's struct).

Struct Name {
  Type1 FieldName1,
  Type2 FieldName2,
  ...
  Typen FieldNamen,
}

Examples:

Struct Date {integer day, integer month, integer year}
Struct Name {string family, string given}
Struct Address {integer number, string street, ...}


Relationships

Relationships describe connections between objects (classes).

A relationship is defined in a class C by:

Syntax for relationship declarations:

relationship Collection(D) RelName inverse D::InverseRelName


Relationships (cont)

Example:
- employees work in a company; may work on many projects
- each project requires several employees to get it completed

interface Employee { ...
  relationship Company WorksFor;
  relationship Set<Project> WorksOn;
};
interface Company { ...
  relationship Set<Employee> Employs;
};
interface Project { ...
  relationship Set<Employee> Involves;
};


Relationships (cont)

Example (with inverses included):

interface Employee { ...
  relationship Company WorksFor
    inverse Company::Employs;
  relationship Set<Project> WorksOn
    inverse Project::Involves;
};
interface Company { ...
  relationship Set<Employee> Employs
    inverse Employee::WorksFor;
};
interface Project { ...
  relationship Set<Employee> Involves
    inverse Employee::WorksOn;
};


Relationships (cont)

Example (family trees):

interface Person {
  attribute string name;
  attribute Date   birthday;
  ...
  relationship Person FatherIs
    inverse Person::FatherOf;
  relationship Set<Person> FatherOf
    inverse Person::FatherIs;
  ...
  relationship Set<Person> Children
    inverse Person::BiologicalParent;
  relationship Array<Person,2> BiologicalParents
    inverse Person::Children;
  ...
  relationship Set<Person> Raised
    inverse Person::RaisedBy;
  relationship Set<Person> RaisedBy
    inverse Person::Raised;
  ...
  relationship Person BestFriend
    inverse Person::BestFriend;
};

Notes:


Relationship Multiplicity

An important property of relationships:

Possibilities: Similarly for R's inverse relationship R-1.

Relationship Multiplicity (cont)

Consider classes C, D, and relationships R, R-1 between them.

The most common relationship multiplicities:

many-to-many C → R → D   is a set of objects
D → R-1 → C   is a set of objects
many-to-one C → R → D   is a single object
D → R-1 → C   is a set of objects
one-to-one C → R → D   is a single object
D → R-1 → C   is a single object


ODL Model for Bank

Kinds of objects:

Some we can model via user-defined types:

Struct Name { string family, string given };
Struct Address {
  integer number;
  string  street;
  string  suburb;
  integer postcode;
};
Enum Activity { deposit, withdrawal };
Enum AcctType { savings, cheque, credit }; 


ODL Model for Bank (cont)

Modelling people:

interface Person {
  attribute Struct Name name;
  attribute Struct Address address;
};

interface Employee : Person {
  attribute int employeeNo;
  attribute float salary;
  relationship Branch WorskAt
    inverse Branch::Staff;
};

interface Manager : Employee {
  relationship Branch Manages
    inverse Branch::ManagedBy;
};


ODL Model for Bank (cont)

Modelling branches:

interface Branch {
  attribute integer branchNo;
  attribute string name;
  attribute Struct Address address;
  attribute float assets;
  ...
  relationship Set<Customer> Customers
    inverse Customer::HomeBranch;
  relationship Set<Account> Accounts
    inverse Account::HeldAt;
};


ODL Model for Bank (cont)

Modelling accounts:

interface Account {
  attribute integer accountNo; 
  attribute float balance;
  ...
  relationship Branch HeldAt
    inverse Branch:Accounts;
  relationship Set<Customer> Owners
    inverse Customers::Accounts;
  relationship List<Transaction> Transactions
    inverse Transaction::TransAcct;
};


ODL Model for Bank (cont)

Modelling transactions:

interface Transaction {
  attribute Enum Activity type;
  attribute Date when;
  attribute float amount;
  ...
  relationship Account TransAcct
    inverse Account::Transactions;
  relationship Customer MadeBy
    inverse Customer::Transactions;
};


ODL Model for Bank (cont)

Modelling people (cont):

interface Customer : Person {
  attribute int customerNo;
  attribute Date memberFrom;
  ...
  relationship Branch HomeBranch
    inverse Branch::Customers;
  relationship Set<Account> Holds
    inverse Account::HeldBy;
  relationship List<Transaction> Transactions
    inverse Transaction::MadeBy;
};


∮ Entity-Relationship (ER) Model


Entity-Relationship Data Modelling

The world is viewed as a collection of inter-related entities.

Goal of ER modelling:

Examples of entities:   John Shepherd, his car, K17, ...

Example of relation:   John Shepherd owns his car


Entity-Relationship Data Modelling (cont)

Each entity is described by:

Examples: An entity set is a collection of entities with similar properties.

Entity-Relationship Data Modelling (cont)

Analogy between ER and OO models:

Differences between ER and OO models:

Entity-Relationship Data Modelling (cont)

The entity-relationship model has existed for almost 30 years.

(Original description: Chen, ACM Transactions on Database Systems, 1(1), 1976)

It was never standardised, but has been well-used ...

In lectures, we use the notation from the KSS book.

Other texts (EN, GUW, RG) use slightly different notation.

Choose whichever one you like, and use it consistenly.


Entity-Relationship (ER) Diagrams

ER diagrams are a graphical tool for data modelling.

An ER diagram consists of:

Warning: 99% of the time ... If we want to refer to a specific entity, we generally say "entity instance".

Entity-Relationship (ER) Diagrams (cont)

Specific visual symbols indicate different ER design elements:

[Diagram:Pic/er-rel/er-symbols.png]


Entity-Relationship (ER) Diagrams (cont)

Example ER diagram (details explained below):

[Diagram:Pic/er-rel/er1.png]


Attributes

Each attribute in an ER diagram:

Information about attribute domains

Attributes (cont)

Simple attributes are drawn from atomic value domains.

(cf. ODL's atomic data types)

Composite attributes consist of a hierarchy of attributes.

(cf. ODL's Struct types)

Single-valued attributes have one value for each entity.

(cf. ODL's atomic or Struct types)

Multivalued attributes have a set of values for each entity.

(cf. ODL's collection types)


Attributes (cont)

Null attributes may contain a distinguished null value to indicate:

Derived attributes contain values that are calculated from other attributes.

(e.g. a person's age may be computed given their date of birth)


Attributes (cont)

Example of attribute notations:

[Diagram:Pic/er-rel/erattrs.png]


Entity Sets

An entity set can be viewed as either:

Entity sets are not necessarily disjoint.

(e.g. a person may be both a Customer and Employee of a bank)

The "raw data" in a database may be viewed as a collection of entity sets.


Keys

ER model does not have a notion of OIDs, so how to distinguish entities?

What if two entities have the same set of attribute values?

They're regarded as the same entity.

So, each entity must have a distinct set of attribute values.

Implications:


Keys (cont)

One approach to ensuring that attribute value sets are distinct:

(like re-introducing OIDs, but they only need to be distinct within an entity set)

This approach is used commonly in practice e.g.

This is a specific example of the more general notion of a key.

Keys (cont)

A superkey is

(i.e. a set of values for superkey attributes identifies at most one entity in the set)

The notion of superkey is a property of the whole entity set (extension).

During design, we typically consider keys relative to all possible extensions.


Keys (cont)

Example (bank customer entities):

Customer=(custNo,name,address,taxFileNo)

Definite superkeys:

Possible superkeys: Unlikely superkeys:

Keys (cont)

The entire set of attributes is always a superkey.

However, most entity sets have several superkeys.

E.g. (custNo,name), (custNo,address), ...

It would be convenient to have just one key to identify entities.

Can we identify a minimal set of attributes to be the key?

In examples with artificial identifiers (e.g. SSN), use identifier.

In other examples ... ?


Keys (cont)

A candidate key is any superkey such that

E.g. (custNo), (taxFileNo), (name,address), ...

A primary key:

A composite key is a key with two or more attributes.

Keys (cont)

Keys are indicated in ER diagrams by underlining the key attributes.

[Diagram:Pic/er-rel/keys.png]


Relationship Sets

A relationship is an association among several entities.

E.g. Customer(9876)   is the owner of   Account(12345)

A relationship set is a collection of relationships of the same type.

E.g. the "is the owner of" relationship set

Customer(9876)   is the owner of   Account(12345)
Customer(9426)   is the owner of   Account(54321)
Customer(9511)   is the owner of   Account(88888)
Customer(9303)   is the owner of   Account(78787)
etc. etc.


Relationship Sets (cont)

Stated more formally:

ER relationships' degree ≥ 2

Relationship Sets (cont)

In an ER diagram, relationship sets are assigned names.

The role of each entity in the relationship is usually implicit.

Roles can be explicitly named if needed   (in red below)

Examples:

[Diagram:Pic/er-rel/erroles.png]


Relationship Sets (cont)

Mapping cardinalities describe the number of entities that a given entity can be associated with via a relationship.

A binary relationship set R between entity sets A and B may be (assuming   a ∈ A,   b ∈ B):

one-to-one each a is associated with at most one b
each b is associated with at most one a
one-to-many each a is associated with zero or more b
each b is associated with at most one a
many-to-one each a is associated with at most one b
each b is associated with zero or more a
many-to-many each a is associated with zero or more b
each b is associated with zero or more a


Relationship Sets (cont)

Examples:

[Diagram:Pic/er-rel/cardinal.png]

An arrow from A to B indicates that there is at most one b ∈ B for each a ∈ A.


Relationship Sets (cont)

An alternative notation makes cardinalities explicit:

[Diagram:Pic/er-rel/cardinal2.png]


Relationship Sets (cont)

Level of participation is another type of relationship constraint.

Participation in relationship set R by entity set A may be:

total every a ∈ A participates in ≥1 relationship in R
partial only some a ∈ A participate in relationships in R

Example:


Relationship Sets (cont)

In ER diagrams:

Example:

[Diagram:Pic/er-rel/particip.png]


Relationship Sets (cont)

In the x-to-one and one-to-x relationships above, we noted that e.g.

"Each a ∈ A is associated with at most one b ∈ B"

Sometimes, we require that there must be exactly one associated entity.

E.g. a manager must have a branch to manage, but a branch may (temporarily) have no manager.

[Diagram:Pic/er-rel/refinteg.png]

This is known as a referential integrity constraint.


Relationship Sets (cont)

In some cases, a relationship needs associated attributes.

Example:

[Diagram:Pic/er-rel/relattr.png]

(Price and quantity are related to products in a particular store)


Weak Entity Sets

Weak entities exist only because of association with other entities.

Examples:

Weak entities

Weak Entity Sets (cont)

While weak entities do not have a primary key

We can can form a primary key by taking a combination of Example:

Weak Entity Sets (cont)

E.g. employees/family, with name as a "key" ...


Employees = {Anne, John, Liam}
Families  = {Alice, Bob, David, Jane, Mary, Tim, Tim}
John's family = {Alice, Bob, Mary, Tim}
Anne's family = {David, Tim}
Liam's family = {Jane}

E.g. loans/payments, with loan ids and payment sequence ...


Loan        = {(1234,John,100000),(4321,Arun,70000)}
Payments    = {(1,1500),(2,1500),(3,1500),(1,750),(2,1500)}
John's payments = {(1,1500),(2,1500),(3,1500)}
Arun's payments = {(1,750),(2,1500)}


Weak Entity Sets (cont)

In ER diagrams:

Example:

[Diagram:Pic/er-rel/weakent.png]


Subclasses and Inheritance

Extensions to the "standard" ER model include inheritance.

A subclass of an entity set A is a set of entities:

In other words, the subclass inherits the attributes and relationships of A.

Some texts use the term lower level entity set as synonym for "subclass".


Subclasses and Inheritance (cont)

If an entity set has multiple subclasses, they may be:

An orthogonal property is the completeness contraint: ER diagrams use the following notation:

Subclasses and Inheritance (cont)

Example of subclasses:

[Diagram:Pic/er-rel/inherit.png]


Design Using the ER Model

The ER model provides a powerful, general set of data modelling tools.

Some considerations in using these tools to create a design:


Design Using the ER Model (cont)

Attribute vs Entity Example (v1)

[Diagram:Pic/er-rel/attr-ent1.png]

Employees can work for several departments, but cannot work for the same department over two different time periods.


Design Using the ER Model (cont)

Attribute vs Entity Example (v2)

[Diagram:Pic/er-rel/attr-ent2.png]

Employees can work for the same department over two different time periods.


Design Using the ER Model (cont)

Design is initially somewhat fluid; no single "correct" answer.

Some design refinements:

The texts discuss design issues relatively briefly.

Other texts discuss design in more detail   (e.g. Conolly/Begg)

To some extent, it's a case of "learn by practising"


ER Model for Bank

Kinds of information in model:

How to give a model?

ER Model for Bank (cont)

ER diagrams are typically too large to fit on a single screen.
(or a single sheet of paper, if printing)

One commonly used strategy:


ER Model for Bank (cont)

Modelling people:

[Diagram:Pic/er-rel/bank1.png]


ER Model for Bank (cont)

Modelling people (cont):

[Diagram:Pic/er-rel/bank2.png]


ER Model for Bank (cont)

Modelling branches, accounts, transactions:

[Diagram:Pic/er-rel/bank3.png]


ER Model for Bank (cont)

Putting it all together with relationships ...

[Diagram:Pic/er-rel/bank4.png]


Limitations of ER Model

There are some design aspects that ER does not deal with:

Some of these are handled later in the relational model.

Summary

Conceptual design (data modelling) follows requirements analysis.

ER model is popular for doing coneptual design

Basic constructs: entities, relationships, attributes

Additional constructs: weak entities, ISA hierarchies

Many notational variants of ER exist
(especially in the expression of constraints on relationships)


Produced: 13 Sep 2020