| COMP9315 24T1 |
Prac Exercise 03 PostgreSQL Server Config and File Structures |
DBMS Implementation |
This simple exercise aims to get you to:
You ought to do it before the end of week 3.
PostgreSQL has a wide range of configuration parameters which are described in Chapter 19 of the PostgreSQL documentation. For the purposes of this lab, we are most interested in the configuration parameters related to resource usage (described in Section 19.4).
Most configuration parameters can be set by modifying the
$PGDATA/postgresql.conf file and restarting the server.
Many configuration parameters can also be set via command-line
arguments to the postgres server when it is initially
invoked.
Note that you cannot set parameters if you invoke the server
via the pgs script; pgs aims to simplify
things by allowing few options and starting the server with the
configuration specified in postgresql.conf.
The standard PostgreSQL mechanism for starting the server is yet
another script, called pg_ctl (see the
pg_ctl
section of the PostgreSQL documentation).
The simplest way to invoke pg_ctl is one of:
pg_ctl start server starting pg_ctl stop waiting for server to shut down.... done server stopped pg_ctl status pg_ctl: server is running (PID: nnnnnn) /localstorage/z5555555/pgsql/bin/postgres
The pgs script simply invokes pg_ctl to
start a server, with some extra options:
pg_ctl -w start -l /localstorage/$USER/pgsql/data/log waiting for server to start...... done server started
The -l option tells the PostgreSQL server which file to use
to write its log messages.
The log file is important, not only because it is where PostgreSQL
writes error messages so that you can work out e.g. why your server
wouldn't start, but also because it is where PostgreSQL writes
statistical information about its performance (if requested).
The -w option tells pg_ctl to wait until the server
has actually started properly before returning. If the server does not
start properly, you will eventually receive a message like:
pg_ctl: could not start server Examine the log output.
If the server fails to start, you should check your environment
and the server setup (e.g. $PGDATA/postgresql.conf).
Note that there are two aspects to consider for the environment:
the contents of /localstorage/$USER/env and
the settings of the shell variables in your current window;
the two should be consistent.
A trouble-shooting guide for setting up your server appears at the bottom of
Prac Exercise P01.
The primary function of the pg_ctl command is to invoke
the postgres server.
It can perform additional functions such as specifying the location
of the log file (as we saw above)
or passing configuration parameters to the server.
To pass configuration parameters, you use the -o option
and a single string containing all the server parameters.
For example, the -B parameter to postgres
lets you say how many shared memory buffers the server should use,
and you could start postgres and get it to use just 16
buffers as follows:
pg_ctl start -o '-B 16' -l /localstorage/$USER/pgsql/log server starting
As a warm-up exercise, work out how many shared buffers the PostgreSQL
server uses by default. (Hint: this is given in the
postgresql.conf file in units of MB (not number of
buffers); each buffer is 8KB long).
Start your PostgreSQL server as normal (i.e. don't change any configuration parameters) before getting started with the exercises.
Under the COMP9315 Pracs directory you'll find a new testing database. Create a new database to hold it, and load it up. There are two representations of the database available:
The dump file is quicker to load, but not as "user-friendly" (i.e. not as readable) as the SQL files.
You create the database in the usual way:
createdb uni
I called the database uni because it contains (fake)
data about a University. You can find out the database schema from
the
schema.sql
file.
To load the database, use the following commands:
psql uni -f /web/cs9315/24T1/pracs/p03/db.dump > load.out 2>&1 grep ERR load.out
The first command loads the dump file and ensures that all output is
written to a file called load.out.
The second command checks for any error messages produced during the
load.
If there are any errors, you should not ignore them,
but instead try to work out what the problem is and fix it.
The first thing to do with any database is to ensure that you understand
what data is in it. Use psql (or some GUI tool, if you're
using one) to explore the database. I've added a function that will give
you counts of the number of tuples in each table:
select * from pop();
table | ntuples
-------------+---------
assessments | 14098
courses | 980
enrolments | 3506
items | 3931
people | 1980
(5 rows)
You can look at the definition of the pop() (short for
"population") either in the
pop.sql
file, or via psql's \df+ command.
If you're using \df+, you'll find it useful to change how
psql displays its results, otherwise the output from
\df+ is a mess. You can switch the output format in
psql using the \x command. This causes
psql to show the value of each attribute on a seperate
line; useful if attributes values are large.
Don't forget to use \x to change the output format
back before continuing.
Once you think you're familiar enough with the database, devise SQL queries to answer the following:
People.id)
People.birthday)
Items.maxmark)
Courses.code,Items.name,Items.maxmarks)
Courses.code,count(Enrolments.student))
Course.code,People.name,Enrolments.mark,sum(Assessment.marks))
For the first four queries above, think about and describe the patterns of access to the data in the tables that would be required to answer them.
Now that you've used the database, let's take a look at how the data is stored in the file system. All data is for a given database is stored under the directory (folder):
$PGDATA/base/OID
where $PGDATA is the location of the PostgreSQL data directory
as set in your env file, and the OID is the unique
internal id of the database from the pg_database table.
Work out, using the PostgreSQL catalog, which directory corresponds to
your newly-created database.
(Hint: the pg_database table will help here. Also,
psql's \dS command will tell you the
names of all catalog tables).
Change into the relevant directory and run the ls command.
This will show dozens of files. Most of these files contain local
data from system catalog tables, while others contain your uni
data.
Recall from lectures that data files associated with a table are
named after the OID of that table.
Use the PostgreSQL catalog to work out which files
correspond to your tables.
All of the data files in this directory are in binary format, so you can't
read them with a text editor or the standard Unix file pagers (like more
and less). Sometimes, however, you can get some information from a
binary file via the strings command, which prints any text-strings
that it finds in the file. Try this on the file corresponding to the Courses
table and you should get a list of course codes and course titles, with a few
"junk" characters. Since this generates a lot of output, you might want to
use something like the following command:
strings OID_of_Courses_data_file | less
BENV2254;Theories of Colour and Light
BENV2228?C20 Arch:Modernity-Deconstruc.
BENV22241Architectural Studies 3
etc. etc. etc.
Note that you won't necessarily see exactly the output shown above. The order that tuples are inserted into a page depends on many factors that vary from system to system. What you are guaranteed to see are some strings containing data relevant to courses.
An alternative way to examine binary data files is via the Unix od
command (read the man entry if you don't know what it does).
Examine the files corresponding to the People table
and the Assessments table
to see if you can observe the data they contain
and also to see if you can work out
how the data is laid out within the pages of the file.
You can can get assistance with
understanding the intra-page data layout
from the source code files:
/localstorage/$USER/postgresql-15.6/src/include/storage/bufpage.h /localstorage/$USER/postgresql-15.6/src/backend/storage/page/bufpage.c
You'll probably notice some other files with similar OIDs
to the data files, and other files with the same OIDs
but with added suffixes. Suggest what might be contained in these
files. (Searching for suffixes in the source code might help for those
files with suffixes).
While you're examining the data files, return to psql and
write a query to print the number of data pages in each relation.
This is a simple modification of the query above to get the table OIDs.
Once you've got the page counts in the catalog, check that they're consistent
with the file sizes in the directory for the uni database
(assuming an 8KB page size).
Let me know via the forums, or come to a consultation if you have any problems with this exercise ... jas