SQL: Schemas, Queries, Updates, Views

SQL

SQL = Structured Query Language (sometimes called "sequel").

SQL is an ANSI/ISO standard language for querying and manipulating relational DBMSs.

Designed to be a "human readable" language comprising:

data definition facilities
database modification operations
database query operations, including:
- relational algebra, set operations, aggregation, grouping, ...

SQL (cont)

SQL was developed at IBM (San Jose Lab) during the 1970's, and standardised in 1986.

DBMSs typically implement the SQL2 standard (aka SQL-92).

Unfortunately, they also:

implement a (large) subset of the standard
extend the standard in various "useful" ways

SQL (in some form) looks likely to survive in the next generation of database systems.

In these slides, we try to use only standard (portable) SQL2.

SQL (cont)

Since SQL2, there have been three new proposed standards:

SQL:1999 added e.g.

boolean and BLOB types, arrays/rows, ...
procedures programming constructs, triggers
recursive queries
OO-like objects, inheritance, ...

SQL:2003 ...

standardised some SQL:1999 extensions
added a standard for meta-data (catalogues)
standardised stored procedures (SQL/PSM)
added a new MERGE statement ("upsert")
defined interfaces to C, Java, XML, object systems, ...

SQL:2008 added additional support for XML.

SQL (cont)

Major DBMSs (Oracle, DB2, SQLServer, PostgreSQL MySQL):

implement most/all of SQL2
implement much of SQL:1999
implement some of SQL:2003
omit difficult-to-implement features e.g. assertions

PostgreSQL ...

implements almost all of SQL2 (see documentation)
does not implement: recursive queries, assertions
provides non-standard mechanisms for: updatable views
currently has PLpgSQL, will also have SQL/PSM soon

SQL (cont)

SQL provides high-level, declarative access to data.

However, SQL is not a Turing-complete programming language.

Applications typically embed evaluation of SQL queries into PL's:

Java and the JDBC API
PHP/Perl/Tcl and their various DBMS bindings
RDBMS-specific programming languages
(e.g. Oracle's PL/SQL, PostgreSQL's PLpgSQL)
C and low-level library interfaces to DBMS engine
(e.g. Oracle's OCI, PostgreSQL's libpq)

SQL (cont)

SQL's query sub-language is based on relational algebra.

Relational algebra:

formal language of expressions mapping tables→tables
comprising three basic operations ...
- select: filter table rows via a condition on attributes
- project: filter table columns by name
- join: combines two tables via a condition
along with set operations (union, intersection, difference)
and a variety of aggregates (including min(), max(), count(), etc)

Example Databases

In order to demonstrate aspects of SQL, we use two databases:

bank: customers, accounts, branches, ...
beer: beers, bars, drinkers, ...

These databases are available for you to play with.

Example Database #1 (cont)

We will use the following instance of this schema:

Branch relation/table instance:


 branchName |    address     | assets 
------------+----------------+--------
 Clovelly   | Clovelly Rd.   |   1000
 Coogee     | Coogee Bay Rd. |  40000
 Maroubra   | Anzac Pde.     |  17000
 Randwick   | Alison Rd.     |  20000
 UNSW       | near Library   |   3000

Customer relation/table instance:


  name  |    address     | customerNo | homebranch 
--------+----------------+------------+------------
 Adam   | Belmore Rd.    |      12345 | Randwick
 Bob    | Rainbow St.    |      32451 | Coogee
 Chuck  | Clovelly Rd.   |      76543 | Clovelly
 David  | Anzac Pde.     |      82199 | UNSW
 George | Anzac Pde.     |      81244 | Maroubra
 Graham | Malabar Rd.    |      92754 | Maroubra
 Greg   | Coogee Bay Rd. |      22735 | Coogee
 Jack   | High St.       |      12666 | Randwick

Example Database #1 (cont)

Account relation/table instance:


 branchName | accountNo | balance 
------------+-----------+---------
 UNSW       | U-245     |    1000
 UNSW       | U-291     |    2000
 Randwick   | R-245     |   20000
 Coogee     | C-123     |   15000
 Coogee     | C-124     |   25000
 Clovelly   | Y-123     |    1000
 Maroubra   | M-222     |    5000
 Maroubra   | M-225     |   12000

Owner relation/table instance:


 account | customer 
---------+----------
 U-245   |    12345
 U-291   |    12345
 U-291   |    12666
 R-245   |    12666
 C-123   |    32451
 C-124   |    22735
 Y-123   |    76543
 M-222   |    92754
 M-225   |    12345

Example Database #2 (cont)

We will use the following instance of this schema:

Bars relation/table instance:


       name       |   addr    | license 
------------------+-----------+---------
 Australia Hotel  | The Rocks |  123456
 Coogee Bay Hotel | Coogee    |  966500
 Lord Nelson      | The Rocks |  123888
 Marble Bar       | Sydney    |  122123
 Regent Hotel     | Kingsford |  987654
 Royal Hotel      | Randwick  |  938500

Drinkers relation/table instance:


  name  |   addr   |   phone    
--------+----------+------------
 Adam   | Randwick | 9385-4444 
 Gernot | Newtown  | 9415-3378 
 John   | Clovelly | 9665-1234 
 Justin | Mosman   | 9845-4321

Example Database #2 (cont)

Beers relation/table instance:


        name         |     manf      
---------------------+---------------
 80/-                | Caledonian
 Bigfoot Barley Wine | Sierra Nevada
 Burragorang Bock    | George IV Inn
 Crown Lager         | Carlton
 Fosters Lager       | Carlton
 Invalid Stout       | Carlton
 Melbourne Bitter    | Carlton
 New                 | Toohey's
 Old                 | Toohey's
 Old Admiral         | Lord Nelson
 Pale Ale            | Sierra Nevada
 Premium Lager       | Cascade
 Red                 | Toohey's
 Sheaf Stout         | Toohey's
 Sparkling Ale       | Cooper's
 Stout               | Cooper's
 Three Sheets        | Lord Nelson
 Victoria Bitter     | Carlton

Example Database #2 (cont)

Frequents relation/table instance:


 drinker |       bar        
---------+------------------
 Adam    | Coogee Bay Hotel
 Gernot  | Lord Nelson
 John    | Coogee Bay Hotel
 John    | Lord Nelson
 John    | Australia Hotel
 Justin  | Regent Hotel
 Justin  | Marble Bar

Example Database #2 (cont)

Likes relation/table instance:


 drinker |        beer         
---------+---------------------
 Adam    | Crown Lager
 Adam    | Fosters Lager
 Adam    | New
 Gernot  | Premium Lager
 Gernot  | Sparkling Ale
 John    | 80/-
 John    | Bigfoot Barley Wine
 John    | Pale Ale
 John    | Three Sheets
 Justin  | Sparkling Ale
 Justin  | Victoria Bitter

Example Database #2 (cont)

Sells relation/table instance:


       bar        |       beer       | price 
------------------+------------------+-------
 Australia Hotel  | Burragorang Bock |  3.50
 Coogee Bay Hotel | New              |  2.25
 Coogee Bay Hotel | Old              |  2.50
 Coogee Bay Hotel | Sparkling Ale    |  2.80
 Coogee Bay Hotel | Victoria Bitter  |  2.30
 Lord Nelson      | Three Sheets     |  3.75
 Lord Nelson      | Old Admiral      |  3.75
 Marble Bar       | New              |  2.80
 Marble Bar       | Old              |  2.80
 Marble Bar       | Victoria Bitter  |  2.80
 Regent Hotel     | New              |  2.20
 Regent Hotel     | Victoria Bitter  |  2.20
 Royal Hotel      | New              |  2.30
 Royal Hotel      | Old              |  2.30
 Royal Hotel      | Victoria Bitter  |  2.30

SQL Syntax

SQL definitions, queries and statements are composed of:

comments ... -- comments to end of line
identifiers ... similar to regular programming languages
keywords ... a large set (e.g. CREATE, SELECT, TABLE)
data types ... a small set (e.g. integer, varchar, date)
operators ... similar to regular programming languages
constants ... similar to regular programming languages

Similar means "often the same, but not always ...

'John', 'blue', 'it''s' are strings
"Students", "Really Silly!" are identifiers

SQL Syntax (cont)

While SQL identifiers and keywords are case-insensitive, we generally:

write keywords in upper case (until it becomes annoying)
e.g. SELECT, FROM, WHERE, CREATE, ...
write relation names with an initial upper-case letter
e.g. Customers, Students, Owns, EnrolledIn
write attribute names in all lower-case
e.g. id, name, partNumber, isActive

We follow the above conventions when writing programs.

We ignore the above conventions when typing in lectures.

SQL Keywords

A categorised list of frequently-used SQL92 keywords:


Querying        Defining Data   Changing Data
SELECT          CREATE          INSERT
FROM            TABLE           INTO
WHERE           INTEGER         VALUES
GROUP BY        REAL            UPDATE
HAVING          VARCHAR         SET
ORDER BY        CHAR            DELETE
DESC            KEY             DROP
EXISTS          PRIMARY         ALTER
IS NULL         FOREIGN
NOT NULL        REFERENCES
IN              CONSTRAINT
DISTINCT        CHECK
AS

There are 225 reserved words in SQL92 ... not a small language.

SQL Keywords (cont)

A list of PostgreSQL's SQL keywords:


ALL           DEFERRABLE    IS           OVERLAPS
ANALYSE       DESC          ISNULL       PRIMARY
ANALYZE       DISTINCT      JOIN         PUBLIC
AND           DO            LEADING      REFERENCES
ANY           ELSE          LEFT         RIGHT
AS            END           LIKE         SELECT
ASC           EXCEPT        LIMIT        SESSION_USER
BETWEEN       FALSE         NATURAL      SOME
BINARY        FOR           NEW          TABLE
BOTH          FOREIGN       NOT          THEN
CASE          FREEZE        NOTNULL      TO
CAST          FROM          NULL         TRAILING
CHECK         FULL          OFF          TRUE
COLLATE       GROUP         OFFSET       UNION
COLUMN        HAVING        OLD          UNIQUE
CONSTRAINT    ILIKE         ON           USER
CROSS         IN            ONLY         USING
CURRENT_DATE  INITIALLY     OR           VERBOSE
CURRENT_TIME  INNER         ORDER        WHEN
CURRENT_USER  INTERSECT     OUTER        WHERE
DEFAULT       INTO

Note that some SQL92 reserved words are not reserved words in PostgreSQL.

SQL Identifiers

Names are used to identify

database objects such as tables, attributes, views, ...
meta-objects such as types, functions, constraints, ...

Identifiers in SQL use similar conventions to programming languages i.e. a sequence of alpha-numerics, starting with an alphabetic.

Can create arbitrary indentifiers by enclosing in "..."

Example identifiers:

employee    student   Courses
last_name   "That's a Great Name!"

Oracle SQL also allows unquoted hash (#) and dollar ($) in identifiers.

SQL Identifiers (cont)

Since SQL does not distinguish case, the following are all treated as being the same identifier:

employee   Employee   EmPlOyEe

Most RDBMSs will let you give the same name to different kinds of objects (e.g. a table called Beer and an attribute called Beer).

Some common naming conventions:

name tables representing entitites via plural nouns
(e.g. Drinkers, TheDrinkers, AllDrinkers, ...)
name foreign key attributes after the table they refer to
(e.g. beer in the Sells relation)

Constants in SQL

Numeric constants have same syntax as programming languages, e.g.

10    3.14159    2e-5    6.022e23

String constants are written in single quotes, e.g.

'John'   'some text'   '!%#%!$'   'O''Brien'
'"'   '[A-Z]{4}\d{4}'   'a VeRy! LoNg String'

PostgreSQL provides extended strings containing \ escapes, e.g.

E'\n'   E'O\'Brien'   E'[A-Z]{4}\\d{4}'   E'John'

Boolean constants: TRUE and FALSE

PostgreSQL also allows 't', 'true', 'yes', 'f', 'false', 'no'

Constants in SQL (cont)

Other kinds of constants are typically written as strings.

Dates: '2008-04-13', Times: '13:30:15'

Timestamps: '2004-10-19 10:23:54'

PostgreSQL also recognises: 'January 26 11:05:10 1988 EST'

Time intervals: '10 minutes', '5 days, 6 hours'

PostgreSQL also has IP address, XML, etc. data types.

SQL Data Types

All attributes in SQL relations are typed (i.e. have domain specified)

SQL supports a small set of useful built-in data types:
text string, number (integer,real), date, boolean, binary

Various type conversions are available (e.g. date to string, string to date, integer to real) and applied automatically "where they make sense".

Basic domain (type) checking is performed automatically.

The NULL value is treated as a member of all data types.

No structured data types are available (in SQL2).

SQL Data Types (cont)

Various kinds of number types are available:

INTEGER (or INT), SMALLINT ... 32/16-bit integers
REAL, DOUBLE PRECISION ... 32/64-bit floating point
NUMBER(d,p) ... fixed-point reals (d digits, p after dec.pt.)

PostgreSQL also provides ...

serial: auto-generated integer values for primary keys
currency: fixed-point reals, displayed as strings $1,000.00

SQL Data Types (cont)

Two string types are available:

CHAR(n) ... uses n bytes, left-justified, blank-padded
VARCHAR(n) ... uses 0..n bytes, no padding

String types can be coerced by blank-padding or truncation.

'abc'::CHAR(2) = 'ab'     'abc'::CHAR(4) = 'abc '

PostgreSQL also provides TEXT for arbitrary strings

convenient; no need to worry "how long is a name?"
efficient (different to some other DBMSs)
but not part of SQL standard

SQL Data Types (cont)

Dates are simply specially-formatted strings, with a range of operations to implement date semantics.

Format is typically YYYY-MM-DD , e.g. '1998-08-02'

Accepts other formats (and has format-conversion functions), but beware of two-digit years (year 2000)

Comparison operators implement before (<) and after (>).

Subtraction counts number of days between two dates.

Etc. etc. ... consult your local SQL Manual

SQL Data Types (cont)

PostgreSQL also supports several non-standard data types.

generic text string data i.e. text
arbitrary binary data (BLOBs) i.e. bytea
geometric data types e.g. point, circle, polygon, ...

Also, extends relational model so that a single attribute can contain an array/matrix of values, e.g.


CREATE TABLE Employees (
       empid     integer primary key,
       name      text,
       pay_rate  float[]
);
INSERT INTO Employees VALUES
       (1234, 'John', '{35.00,45.00,60.00}');
SELECT pay_rate[2] FROM Employees ...

Tuple and Set Literals

Tuple and set constants are both written as:

( val₁, val₂, val₃, ... )

The correct interpretation is worked out from the context.

Examples:


INSERT INTO Student(stude#, name, course)
VALUES (2177364, 'Jack Smith', 'BSc')
       -- tuple literal

CREATE TABLE Academics (
       id   integer,
       name varchar(40),
       job  varchar(10) CHECK
               job IN ('Lecturer', 'Tutor');
               -- set literal

Tuple and Set Literals (cont)

SQL data types provide coarse-grained control over values.

If more fine-grained control over values is needed:

constraints can express more precise conditions
new "data types" can be defined

Examples:

CREATE DOMAIN PositiveInt AS INTEGER
   CHECK (VALUE > 0);
CREATE DOMAIN Colour AS 
   CHECK (VALUE IN ('red','yellow','green','blue','violet'));
CREATE TABLE T (
   x Colour,
   y PositiveInt,
   z INTEGER CHECK (z BETWEEN 10 AND 20)
);

SQL Operators

Comparison operators are defined on all types:

<   >   <=   >=   =   <>  (or !=)

Boolean operators AND, OR, NOT are also available

Note AND,OR are not "short-circuit" in the same way as C's &&,||

Most data types also have type-specific operations available

See PostgreSQL Documentation Chapter 8/9 for data types and operators

SQL Operators (cont)

String comparison:

str₁ < str₂ ... compare using dictionary order
str LIKE pattern ... matches string to pattern

Pattern-matching uses SQL-specific pattern expressions:

% matches anything (like .*)
_ matches any single char (like .)

SQL Operators (cont)

Examples (using SQL92 pattern matching):

Name LIKE 'Ja%' Name begins with 'Ja'

Name LIKE '_i%' Name has 'i' as 2nd letter

Name LIKE '%o%o%' Name contains two 'o's

Name LIKE '%ith' Name ends with 'ith'

Name LIKE 'John' Name matches 'John'

PostgreSQL also supports case-insensitive match: ILIKE

SQL Operators (cont)

Most Unix-based DBMSs utilise the regexp library

to provide full POSIX regular expression matching

PostgreSQL uses the ~ operator for this:

Attr ~ 'RegExp'

PostgreSQL also provides full-text searching (see doc)

SQL Operators (cont)

Examples (using POSIX regular expressions):

Name ~ '^Ja' Name begins with 'Ja'

Name ~ '^.i' Name has 'i' as 2nd letter

Name ~ '.*o.*o.*' Name contains two 'o's

Name ~ 'ith$' Name ends with 'ith'

Name ~ 'John' Name matches 'John'

SQL Operators (cont)

String manipulation:

str₁ || str₂ ... return concatenation of str₁ and str₂
lower(str) ... return lower-case version of str
substring(str,start,count) ... extract chars from str

Etc. etc. ... consult your local SQL Manual (e.g. PostgreSQL Sec 9.4)

Note that above operations are null-preserving (strict):

if any operand is NULL, result is NULL
beware of (a||' '||b||' '||c) ... NULL if any of a, b, c are null

SQL Operators (cont)

Arithmetic operations:


+  -  *  /  abs  ceil  floor  power  sqrt  sin

Aggregations apply to a column of numbers in a relation:

count(attr) ... number of rows in attr column
sum(attr) ... sum of values for attr
avg(attr) ... mean of values for attr
min/max(attr) ... min/max of values for attr

Note: count applies to columns of non-numbers as well.

SQL Operators (cont)

NULL in arithmetic operation always yields NULL, e.g.

3 + NULL = NULL      1 / NULL = NULL

NULL in aggregations is ignored (treated as unknown), e.g.

sum(1,2,3,4,5,6)       = 21
sum(1,2,NULL,4,NULL,6) = 13
avg(1,2,3,4,5)         = 3
avg(NULL,2,NULL,4)     = 3

The NULL Value

Expressions containing NULL generally yield NULL.

However, boolean expressions use three-valued logic:

a b a AND b a OR b

TRUE TRUE TRUE TRUE

TRUE FALSE FALSE TRUE

TRUE NULL NULL TRUE

FALSE FALSE FALSE FALSE

FALSE NULL FALSE NULL

NULL NULL NULL NULL

The NULL Value (cont)

Important consequence of NULL behaviour ...

These expressions do not work as (might be) expected:

x = NULL    x <> NULL

Both return NULL regardless of the value of x

Can only test for NULL using:

x IS NULL     x IS NOT NULL

The NULL Value (cont)

Other ways PostgeSQL provides for dealing with NULL:

coalesce(Val₁,Val₂,...Val_n)

returns first non-null value Val_i
useful for providing a "displayable" value for nulls

nullif(Val₁,Val₂)

returns null if Val₁ is equal to Val₂
can be used to provide inverse of coalesce()

Relational Data Definition

In order to give a relational data model, we need to:

describe tables
describe attributes that comprise tables
describe any constraints on the data

A relation schema defines an individual table.

A database schema is a collection of relation schemas that defines the structure of and constraints on an entire database.

Relational Data Definition (cont)

So far, we have given relational schemas informally, e.g.

individual relation schemas


Account(accountNo, branchName, balance)
Branch(branchNo, address, assets)
Customer(customerNo, name, address, homeBranch)
Owner(customer,branch)

database schemas

SQL Data Definition Language

SQL is normally considered to be a query language.

However, it also has a data definition sub-language (DDL) for describing database schemas.

The SQL DDL allows us to specify:

names of tables
names and domains for attributes
various types of constraints (e.g. primary/foreign keys)

It also provides mechanisms for performance tuning (see later).

Defining a Database Schema

Relations (tables) are described using:

CREATE TABLE RelName (
    attribute₁   domain₁   constraints,
    attribute₂   domain₂   constraints,
    ...
    table-level constraints, ...
)

where constraints can include details about primary keys, foreign keys, default values, and constraints on attribute values.

This not only defines the table schema but also creates an empty instance of the table.

Tables are removed via DROP TABLE RelName;