Database installations are typically ...
Tasks of the database administrator:
The DBMS itself assists DB administration by:
In the PostgreSQL installations on grieg you are the DBA
(aka PostgreSQL super-user, owner of postmaster process)
Data/files associated with server live under /srvr/
/pgsql903/
Some things that you could potentially do:
postgresql.conf
CREATE USER
CREATE ROLE
GRANT
REVOKE
pg_hba.conf
An RDBMS maintains a collection of relation instances.
To do this, it also needs information about relations:
This information is stored in the system catalog.
(The "system catalog" is also called "data dictionary" or "system view")
DBMSs typically use a hierarchy of namespaces to manage names:
Database (or Catalog)
Schema
Table
DBMSs store the catalog data in a collection of special tables:
(A small fragment of the meta-data tables in a typical RDBMS)
SQL:2003 standard view of metadata: INFORMATION_SCHEMA
The INFORMATION_SCHEMA
Schemata(catalog_name, schema_name, schema_owner, ...)
Tables(table_catalog, table_schema, table_name, table_type, ...)
Columns(table_catalog, table_schema, table_name, column_name,
ordinal_position, column_default, data_type, ...)
Views(table_catalog, table_schema, table_name, view_definition, ...)
Table_Constraints(..., constraint_name, ..., constraint_type, ...)
etc. etc. etc.
DBMS internal meta-data is often different to standard, e.g.
Users(id:int, name:string, ...)
Databases(id:int, name:string, owner:ref(User), ...)
Schemas(id:int, name:string, owner:ref(User), ...)
Types(id:int, name:string, defn:string, size:int, ...)
Tables(id:int, name:string, owner:ref(User),
inSchema:ref(Schema), ...)
Attributes(id:int, name:string, table:ref(Table),
type:ref(Type), pkey:bool, ...)
TableConstraints(id:int, name:string, table:ref(Table),
defn:string, ...)
AttrConstraints(id:int, name:string, attr:ref(Attribute),
defn:string, ...)
-- etc. etc. etc.
SQL DDL operations such as
create table Abc ( x integer primary key, y integer);
are implemented internally as operations on meta-data, e.g.
userID := current_user(); schemaID := current_schema(); tabID := nextval('tab_id_seq'); select into intID id from Types where name='integer'; insert into Tables(id,name,owner,inSchema,...) values (tabID, 'abc', userID, schema, ...) attrID := nextval('attr_id_seq'); insert into Attributes(id,name,table,type,pkey,...) values (attrID, 'x', tabID, intID, true, ...) attrID := nextval('attr_id_seq'); insert into Attributes(id,name,table,type,pkey,...) values (attrID, 'y', tabID, intID, false, ...)
Users typically have access to the system catalog via
\d
\df
select * from tab
PostgreSQL stores catalog information as regular tables.
The \d?
psql
\dt |
list information about tables | |
\dv |
list information about views | |
\df |
list information about functions | |
\dp |
list table access privileges | |
\dT |
list information about data types | |
\dd |
shows comments attached to DB objects |
A PostgreSQL installation typically has several databases.
Some catalog information is global, e.g.
pg_authid
oid |
integer key to reference user | |
rolname |
symbolic user name (e.g. jas |
|
rolpasswd |
md5-encrypted password | |
rolcreatedb |
can create new databases | |
rolsuper |
is a superuser (owns server process) | |
rolcatupdate |
can update system catalogs |
pg_database
datname |
database name (e.g. nssis |
|
datdba |
database owner (refs pg_auth.oid |
|
datpath |
where files for database are stored (if not in the PG_DATA directory) |
|
datacl |
access permissions |
pg_class
relname |
name of table (e.g. employee |
|
relnamespace |
schema in which table defined (refs pg_namespace.oid |
|
reltype |
data type corresponding to table (refs pg_type.oid |
|
relowner |
owner (refs pg_authid.oid |
|
reltuples |
# tuples in table | |
relacl |
access permissions |
Also holds info about objects other than tables, e.g. views, sequences, indexes.
pg_class
relkind |
what kind of object 'r' = ordinary table, 'i' = index, 'v' = view 'c' = composite type, 'S' = sequence, 's' = special |
|
relnatts |
# attributes in table (how many entries in pg_attribute |
|
relchecks |
# of constraints on table (how many entries in pg_constraint |
|
relhasindex |
table has/had an index? | |
relhaspkey |
table has/had a primary key? |
etc.
pg_type
typname |
name of type (e.g. integer |
|
typnamespace |
schema in which type defined (refs pg_namespace.oid |
|
typowner |
owner (refs pg_authid.oid |
|
typtype |
what kind of data type 'b' = base type, 'c' = complex (row) type, ... |
|
typlen |
how much storage used for type values (-1 for variable-length types, e.g. text |
|
typrelid |
table associated to complex type (refs pg_class.oid |
etc.
pg_attribute
attname |
name of attribute (e.g. id |
|
attrelid |
table this attribute belongs to (refs pg_class.oid |
|
atttypid |
data type of this attribute (refs pg_type.oid |
|
attlen |
storage space required by attribute (a copy of pg_type.typlen |
|
attnum |
attribute position (1..n, sys attrs are -ve) |
pg_attribute
attnotnull |
attribute may not be null? | |
atthasdef |
attribute has a default values (value is held in pg_attrdef |
|
attisdropped |
attribute has been dropped from table |
plus others related to strage properties of attribute.
pg_proc
proname |
name of function (e.g. substr |
|
pronamespace |
schema in which function defined (refs pg_namespace.oid |
|
proowner |
owner (refs pg_authid.oid |
|
prolang |
what language function written in | |
proacl |
access control |
etc.
pg_proc
pronargs |
how many arguments | |
prorettype |
return type (refs pg_type.oid |
|
proargtypes |
argument types (vector of refs pg_type.oid |
|
proisstrict |
returns null if any arg is null | |
prosrc |
source code if interpreted (e.g. PLpgSQL) |
pg_constraints
conname |
name of constraint (not unique) | |
connamespace |
schema containing this constraint | |
contype |
kind of constraint 'c' = check, 'u' = unique, 'p' = primary key, 'f' = foreign key |
|
conrelid |
which table (refs pg_class.oid |
|
conkey |
which attributes (vector of values from pg_attribute.attnum |
|
consrc |
check constraint expression |
For full details on PostgreSQL catalogs:
PostgreSQL 9.0.3 Developer's Guide
Part VII, Chapter 45, System Catalogs
Part IV, Chapter 34, The Information Schema
Database security has to meet the following objectives:
Security mechanisms operate at a number of levels
Access to DBMSs involves two aspects:
psql
Note: we don't need to supply username/password to psql
psql
Note: access to databases via the Web involves:
Database users are set up by the DBA via the command:
CREATE USER Name IDENTIFIED BY 'Password'
Various privileges can be assigned at user-creation time, e.g.
ALTER USER Name IDENTIFIED BY 'NewPassword'
This command is also used to change privileges, quotas, etc.
A user may be associated with a role or group
CREATE ROLE RoleName
Examples of roles:
In older versions of PostgreSQL ...
USER
GROUP
USER
CREATE USER
GROUP
CREATE GROUP
GROUP
ALTER GROUP ... ADD USER ...
USER
GROUP
ROLE
Older syntax is retained for backward compatibility.
PostgreSQL has two ways to create users ...
From the Unix command line, via the command
createuser [ -a | -d ] Name
(-a
-d
From SQL, via the statement:
CREATE ROLE UserName Options
-- where Options include ...
PASSWORD 'Password'
CREATEDB | NOCREATEDB
CREATEUSER | NOCREATEUSER
IN GROUP GroupName
VALID UNTIL 'TimeStamp'
Groups are created as ROLE
CREATE ROLE GroupName
--or--
CREATE ROLE GroupName
WITH USER UserName1, UserName2, ...
and may be subsequently modified by
GRANT GroupName TO UserName1, UserName2, ...
--and
REVOKE GroupName FROM UserName1, UserName2, ...
PostgreSQL stores user information in a table pg_authid
Each user is associated with a unique identifying number.
Every PostgreSQL is created with a default user (id=1, superuser).
Some fields in the pg_authid
oid
rolname
jas
rolpassword
PostgreSQL uses a file called pg_hba.conf
hba
This file contains a sequence of entries where each entry has:
all
all
Possible authentication methods:
md5
pg_shadow
krb5
reject
trust
ident
ident
ident
sameuser
Examples from pg_hba.conf
# Allow any user on the local system to connect to any database # as any user if accessing from local machine via Unix socket. # #TYPE DATABASE USER IP-ADDRESS IP-MASK METHOD local all all trust # Reject all connection from 192.168.54.1 # #TYPE DATABASE USER IP-ADDRESS IP-MASK METHOD host all all 192.168.54.1 255.255.255.255 reject # Allow any user from any host with IP address 192.168.93.x # to connect to database "mydb" as the same user name that # ident reports for the connection (typically Unix user name). # #TYPE DATABASE USER IP-ADDRESS IP-MASK METHOD host mydb all 192.168.93.0 255.255.255.0 ident sameuser # Allow a user from host 192.168.12.10 to connect to database # "mydb" only if the user's password is correctly supplied. # #TYPE DATABASE USER IP-ADDRESS IP-MASK METHOD host mydb all 192.168.12.10 255.255.255.255 md5
SQL access control deals with
ALTER
DROP
The owner of an object can assign privileges on that object to other users.
Accomplished via the command:
GRANT Privileges ON Object TO list of (Users|Roles) | PUBLIC [ WITH GRANT OPTION ]
Privileges can be ALL
ALTER
DROP
WITH GRANT OPTION
Effects of privilege granting
Privileges can be withdrawn via the command:
REVOKE Privileges ON Object FROM ListOf (Users|Roles) | PUBLIC CASCADE | RESTRICT
Normally withdraws Privileges from just specified users/roles.
CASCADE
E.g. revoking from U1 also revokes U5 and U6
RESTRICT
E.g. revoking from U1 fails, revoking U5 or U2 succeeds
Privileges available for users on database objects:
SELECT
ALTER TABLE
INSERT
INSERT(
)
More privileges available for users on database objects:
UPDATE
UPDATE(
)
DELETE
DROP
More privileges available for users on database objects:
REFERENCES(
)
EXECUTE
TRIGGER
PostgreSQL follows the above with some minor variations:
GROUP
RULE
Allowing users to assign privileges to others can be exploited.
Example: student S wants access to table of marks M
Above approach is called discretionary access control
Access control is described in terms of ...
Subjects and objects are assigned to security classes.
Impose these restrictions on every data access:
Statistical databases
Consider: anonymous surveys in an on-line survey system
How to solve this problem?
Schema design:
Tuning requires us to consider the following:
Performance can be considered at two times:
Normalisation structures data to minimise storage redundancy.
Example: Courses = Course Subject
Term
If we frequently need to refer to course "standard" name
courseName
Course
Course
Course
-- can now replace a query like: select s.code||t.year||t.sess, e.grade, e.mark from Course c, CourseEnrolment e, Subject s, Term t where e.course = c.id and c.subject = s.id and c.term = t.id -- by a query like: select c.courseName, e.grade, e.mark from Course c, CourseEnrolment e where e.course = c.id
Indexes provide efficient content-based access to tuples.
Can build indexes on any (combination of) attributes.
Definining indexes:
CREATE INDEX name ON table ( attr1, attr2, ... )
attri can be an arbitrary expression (e.g. upper(name)
CREATE INDEX
UNIQUE
USING
Indexes can make a huge difference to query processing cost.
On the other hand, they introduce overheads (storage, updates).
Creating indexes to maximise performance benefits:
select * from Employee where id = 12345 select * from Employee where age > 60 select * from Employee where salary between 10000 and 20000
Considerations in applying indexes:
-- use hashing for (unique) attributes in equality tests, e.g. select * from Employee where id = 12345 -- use B-tree for attributes in range tests, e.g. select * from Employee where age > 60
Sometimes, a query can be re-phrased to affect performance:
select name from Employee where salary/365 > 10.0 -- fix by re-phrasing condition to (salary > 3650) select name from Employee where name like '%ith%' select name from Employee where birthday is null -- above two are difficult to "fix" select name from Employee where dept in (select id from Dept where ...) -- fix by using Employee join Dept on (e.dept=d.id)
Other factors to consider in query tuning:
select distinct
distinct
select ... Employee join Customer on (s.name = p.name)
vs
select ... Employee join Customer on (s.ssn = p.ssn)
or
or
or
select name from Employee where dept=1 or dept=2
vs
(select name from Employee where dept=1)
union
(select name from Employee where dept=2)
PostgreSQL provides the explain
EXPLAIN [ANALYZE] Query
Without ANALYZE
EXPLAIN
With ANALYZE
EXPLAIN
Note that runtimes may show considerable variation due to buffering.
Example: Select on indexed attribute
ass2=# explain select * from student where id=100250; QUERY PLAN ----------------------------------------------------------------------------- Index Scan using student_pkey on student (cost=0.00..5.94 rows=1 width=17) Index Cond: (id = 100250) ass2=# explain analyze select * from student where id=100250; QUERY PLAN ----------------------------------------------------------------------------- Index Scan using student_pkey on student (cost=0.00..5.94 rows=1 width=17) (actual time=31.209..31.212 rows=1 loops=1) Index Cond: (id = 100250) Total runtime: 31.252 ms
Example: Select on non-indexed attribute
ass2=# explain select * from student where stype='local'; QUERY PLAN ---------------------------------------------------------- Seq Scan on student (cost=0.00..70.33 rows=18 width=17) Filter: ((stype)::text = 'local'::text) ass2=# explain analyze select * from student where stype='local'; QUERY PLAN -------------------------------------------------------------------- Seq Scan on student (cost=0.00..70.33 rows=18 width=17) (actual time=0.061..4.784 rows=2512 loops=1) Filter: ((stype)::text = 'local'::text) Total runtime: 7.554 ms
Example: Join on a primary key (indexed) attribute
ass2=# explain ass2-# select s.sid,p.name from Student s, Person p where s.id=p.id; QUERY PLAN ------------------------------------------------------------------------- Hash Join (cost=70.33..305.86 rows=3626 width=52) Hash Cond: ("outer".id = "inner".id) -> Seq Scan on person p (cost=0.00..153.01 rows=3701 width=52) -> Hash (cost=61.26..61.26 rows=3626 width=8) -> Seq Scan on student s (cost=0.00..61.26 rows=3626 width=8)
Join on a primary key (indexed) attribute:
ass2=# explain anaylze ass2-# select s.sid,p.name from Student s, Person p where s.id=p.id; QUERY PLAN ------------------------------------------------------------------------- Hash Join (cost=70.33..305.86 rows=3626 width=52) (actual time=11.680..28.242 rows=3626 loops=1) Hash Cond: ("outer".id = "inner".id) -> Seq Scan on person p (cost=0.00..153.01 rows=3701 width=52) (actual time=0.039..5.976 rows=3701 loops=1) -> Hash (cost=61.26..61.26 rows=3626 width=8) (actual time=11.615..11.615 rows=3626 loops=1) -> Seq Scan on student s (cost=0.00..61.26 rows=3626 width=8) (actual time=0.005..5.731 rows=3626 loops=1) Total runtime: 32.374 ms
Produced: 13 Sep 2020