COMP9315 24T1 |
Prac Exercise 03 PostgreSQL Server Config and File Structures |
DBMS Implementation |
This simple exercise aims to get you to:
You ought to do it before the end of week 3.
PostgreSQL has a wide range of configuration parameters which are described in Chapter 19 of the PostgreSQL documentation. For the purposes of this lab, we are most interested in the configuration parameters related to resource usage (described in Section 19.4).
Most configuration parameters can be set by modifying the
$PGDATA/postgresql.conf
file and restarting the server.
Many configuration parameters can also be set via command-line
arguments to the postgres
server when it is initially
invoked.
Note that you cannot set parameters if you invoke the server
via the pgs
script; pgs
aims to simplify
things by allowing few options and starting the server with the
configuration specified in postgresql.conf
.
The standard PostgreSQL mechanism for starting the server is yet
another script, called pg_ctl
(see the
pg_ctl
section of the PostgreSQL documentation).
The simplest way to invoke pg_ctl
is one of:
pg_ctl start server starting pg_ctl stop waiting for server to shut down.... done server stopped pg_ctl status pg_ctl: server is running (PID: nnnnnn) /localstorage/z5555555/pgsql/bin/postgres
The pgs
script simply invokes pg_ctl
to
start a server, with some extra options:
pg_ctl -w start -l /localstorage/$USER/pgsql/data/log waiting for server to start...... done server started
The -l
option tells the PostgreSQL server which file to use
to write its log messages.
The log file is important, not only because it is where PostgreSQL
writes error messages so that you can work out e.g. why your server
wouldn't start, but also because it is where PostgreSQL writes
statistical information about its performance (if requested).
The -w
option tells pg_ctl
to wait until the server
has actually started properly before returning. If the server does not
start properly, you will eventually receive a message like:
pg_ctl: could not start server Examine the log output.
If the server fails to start, you should check your environment
and the server setup (e.g. $PGDATA/postgresql.conf
).
Note that there are two aspects to consider for the environment:
the contents of /localstorage/$USER/env
and
the settings of the shell variables in your current window;
the two should be consistent.
A trouble-shooting guide for setting up your server appears at the bottom of
Prac Exercise P01.
The primary function of the pg_ctl
command is to invoke
the postgres
server.
It can perform additional functions such as specifying the location
of the log file (as we saw above)
or passing configuration parameters to the server.
To pass configuration parameters, you use the -o
option
and a single string containing all the server parameters.
For example, the -B
parameter to postgres
lets you say how many shared memory buffers the server should use,
and you could start postgres
and get it to use just 16
buffers as follows:
pg_ctl start -o '-B 16' -l /localstorage/$USER/pgsql/log server starting
As a warm-up exercise, work out how many shared buffers the PostgreSQL
server uses by default. (Hint: this is given in the
postgresql.conf
file in units of MB (not number of
buffers); each buffer is 8KB long).
Start your PostgreSQL server as normal (i.e. don't change any configuration parameters) before getting started with the exercises.
Under the COMP9315 Pracs directory you'll find a new testing database. Create a new database to hold it, and load it up. There are two representations of the database available:
The dump file is quicker to load, but not as "user-friendly" (i.e. not as readable) as the SQL files.
You create the database in the usual way:
createdb uni
I called the database uni
because it contains (fake)
data about a University. You can find out the database schema from
the
schema.sql
file.
To load the database, use the following commands:
psql uni -f /web/cs9315/24T1/pracs/p03/db.dump > load.out 2>&1 grep ERR load.out
The first command loads the dump file and ensures that all output is
written to a file called load.out
.
The second command checks for any error messages produced during the
load.
If there are any errors, you should not ignore them,
but instead try to work out what the problem is and fix it.
The first thing to do with any database is to ensure that you understand
what data is in it. Use psql
(or some GUI tool, if you're
using one) to explore the database. I've added a function that will give
you counts of the number of tuples in each table:
select * from pop(); table | ntuples -------------+--------- assessments | 14098 courses | 980 enrolments | 3506 items | 3931 people | 1980 (5 rows)
You can look at the definition of the pop()
(short for
"population") either in the
pop.sql
file, or via psql
's \df+
command.
If you're using \df+
, you'll find it useful to change how
psql
displays its results, otherwise the output from
\df+
is a mess. You can switch the output format in
psql
using the \x
command. This causes
psql
to show the value of each attribute on a seperate
line; useful if attributes values are large.
Don't forget to use \x
to change the output format
back before continuing.
Once you think you're familiar enough with the database, devise SQL queries to answer the following:
People.id
)
People.birthday
)
Items.maxmark
)
Courses.code,Items.name,Items.maxmarks
)
Courses.code,count(Enrolments.student)
)
Course.code,People.name,Enrolments.mark,sum(Assessment.marks)
)
For the first four queries above, think about and describe the patterns of access to the data in the tables that would be required to answer them.
Now that you've used the database, let's take a look at how the data is stored in the file system. All data is for a given database is stored under the directory (folder):
$PGDATA/base/OID
where $PGDATA
is the location of the PostgreSQL data directory
as set in your env
file, and the OID
is the unique
internal id of the database from the pg_database
table.
Work out, using the PostgreSQL catalog, which directory corresponds to
your newly-created database.
(Hint: the pg_database
table will help here. Also,
psql
's \dS
command will tell you the
names of all catalog tables).
Change into the relevant directory and run the ls
command.
This will show dozens of files. Most of these files contain local
data from system catalog tables, while others contain your uni
data.
Recall from lectures that data files associated with a table are
named after the OID
of that table.
Use the PostgreSQL catalog to work out which files
correspond to your tables.
All of the data files in this directory are in binary format, so you can't
read them with a text editor or the standard Unix file pagers (like more
and less
). Sometimes, however, you can get some information from a
binary file via the strings
command, which prints any text-strings
that it finds in the file. Try this on the file corresponding to the Courses
table and you should get a list of course codes and course titles, with a few
"junk" characters. Since this generates a lot of output, you might want to
use something like the following command:
strings OID_of_Courses_data_file | less
BENV2254;Theories of Colour and Light
BENV2228?C20 Arch:Modernity-Deconstruc.
BENV22241Architectural Studies 3
etc. etc. etc.
Note that you won't necessarily see exactly the output shown above. The order that tuples are inserted into a page depends on many factors that vary from system to system. What you are guaranteed to see are some strings containing data relevant to courses.
An alternative way to examine binary data files is via the Unix od
command (read the man
entry if you don't know what it does).
Examine the files corresponding to the People
table
and the Assessments
table
to see if you can observe the data they contain
and also to see if you can work out
how the data is laid out within the pages of the file.
You can can get assistance with
understanding the intra-page data layout
from the source code files:
/localstorage/$USER/postgresql-15.6/src/include/storage/bufpage.h /localstorage/$USER/postgresql-15.6/src/backend/storage/page/bufpage.c
You'll probably notice some other files with similar OID
s
to the data files, and other files with the same OID
s
but with added suffixes. Suggest what might be contained in these
files. (Searching for suffixes in the source code might help for those
files with suffixes).
While you're examining the data files, return to psql
and
write a query to print the number of data pages in each relation.
This is a simple modification of the query above to get the table OIDs.
Once you've got the page counts in the catalog, check that they're consistent
with the file sizes in the directory for the uni
database
(assuming an 8KB page size).
Let me know via the forums, or come to a consultation if you have any problems with this exercise ... jas