FieldDesc
❖ Things To Note |
complex.c
SET
run_test.py
give
❖ Assignment 1 |
pname.c
struct PersonName
pname_in()
PersonName
pname_out()
PersonName
pname_family()
pname_given()
pname_show()
pname_eq()
pname_lt()
pname.source
❖ Assignment 1 (cont) |
How it fits together:
Makefile
pname.c
pname.so
Makefile
pname.source
pname.sql
pname.sql
drop PersonName cascade
❖ Tuples |
Each page contains a collection of tuples
What do tuples contain? How are they structured internally?
❖ Records vs Tuples |
A table is defined by a collection of attributes (schema), e.g.
create table Employee ( id integer primary key, name varchar(20), job varchar(10), dept number(4) );
Tuple = collection of attribute values for such a schema, e.g.
(33357462, 'Neil Young', 'Musician', 0277)
Record = sequence of bytes, containing data for one tuple, e.g.
Bytes need to be interpreted relative to schema to get tuple
❖ Operations on Records |
Common operation on records ... access record via RecordId
Record get_record(Relation rel, RecordId rid) { (pid,tid) = rid; Page *buf = request_page(rel, pid); return get_bytes(buf, tid); }
Gives a sequence of bytes, which needs to be interpreted, e.g.
Relation rel = ... // relation schema
Record rec = get_record(rel,rid)
Tuple t = makeTuple(rel,rec)
Once we have a tuple, we can access individual attributes/fields
❖ Operations on Tuples |
Once we have a record, we need to interpret it as a tuple ...
Tuple t = makeTuple(Relation rel, Record rec)
rel
Once we have a tuple, we want to examine its contents ...
Typ getTypField(Tuple t, int fieldNum)
fno
Tuple
int x = getIntField(t,1)
char *s = getStrField(t,2)
❖ Scanning |
Access methods typically involve iterators, e.g.
Scan s = start_scan(Relation r, ...)
r
Scan
WHERE
Scan
Tuple next_tuple(Scan s)
Tuple
NULL
Tuples
❖ Example Query |
Example: simple scan of a table ...
select name from Employee
implemented as:
DB db = openDatabase("myDB");
Relation r = openRel(db,"Employee");
Scan s = start_scan(r);
Tuple t; // current tuple
while ((t = next_tuple(s)) != NULL)
{
char *name = getStrField(t,2);
printf("%s\n", name);
}
❖ Exercise: Implement next_tuple() |
Consider the following possible Scan
typedef struct { Relation rel; Page *curPage; // Page buffer int curPID; // current pid int curTID; // current tid } ScanData;
Assume tuples are indexed 0..nTuples(p)
Assume pages are indexed 0..nPages(rel)
Implement the Tuple next_tuple(Scan)
P.S. What's in a Relation
❖ Fixed-length Records |
Encoding scheme for fixed-length records:
Since record format is frequently used at query time, should be in memory.
❖ Variable-length Records |
Some encoding schemes for variable-length records:
❖ Converting Records to Tuples |
A Record
byte[]
Tuple
Tuple
struct
Information on how to interpret the bytes as typed values
❖ Converting Records to Tuples (cont) |
DBMSs typically define a fixed set of field types, e.g.
DATE
FLOAT
INTEGER
NUMBER(
)
VARCHAR(
)
This determines implementation-level data types:
DATE |
time_t |
|
FLOAT |
float,double |
|
INTEGER |
int,long |
|
NUMBER( ) |
int[] |
|
VARCHAR( ) |
char[] |
❖ Converting Records to Tuples (cont) |
A Tuple
FieldDesc
Record
typedef struct { ushort nfields; // number of fields/attrs ushort data_off; // offset in struct for data FieldDesc fields[]; // field descriptions Record data; // pointer to record in buffer } Tuple;
Fields are derived from relation descriptor + record instance data.
❖ Converting Records to Tuples (cont) |
Tuple data
❖ Converting Records to Tuples (cont) |
Or, tuple data
Tuple struct
❖ Exercise: How big is a FieldDesc |
FieldDesc
E.g.
❖ PostgreSQL Tuples |
Definitions: include/postgres.h
include/access/*tup*.h
Functions: backend/access/common/*tup*.c
HeapTuple heap_form_tuple(desc,values[],isnull[])
heap_deform_tuple(tuple,desc,values[],isnull[])
Datum
❖ PostgreSQL Tuples (cont) |
Tuple-related data types:
// representation of a data value
typedef uintptr_t Datum;
The actual data value:
Datum
int
❖ PostgreSQL Tuples (cont) |
Tuple-related data types: (cont)
// TupleDescData: schema-related information for HeapTuples typedef struct tupleDescData { int natts; // number of attributes in the tuple Oid tdtypeid; // composite type ID for tuple type int32 tdtypmod; // typmod for tuple type int tdrefcount; // reference count (-1 if not counting) TupleConstr *constr; // constraints, or NULL if none FormData_pg_attribute attrs[FLEXIBLE_ARRAY_MEMBER]; // attrs[N] is description of attribute number N+1 } *TupleDesc;
❖ PostgreSQL Tuples (cont) |
HeapTupleData
typedef HeapTupleData *HeapTuple; typedef struct HeapTupleData { uint32 t_len; // length of *t_data ItemPointerData t_self; // SelfItemPointer Oid t_tableOid; // table the tuple came from HeapTupleHeader t_data; // -> tuple header and data } HeapTupleData;
HeapTupleHeader
❖ PostgreSQL Tuples (cont) |
PostgreSQL stores a single block of data for tuple
byte[]
typedef struct HeapTupleHeaderData // simplified { HeapTupleFields t_heap; ItemPointerData t_ctid; // TID of this tuple or newer version uint16 t_infomask2; // #attributes + flags uint16 t_infomask; // flags e.g. has_null, has_varwidth uint8 t_hoff; // sizeof header incl. bitmap+padding // above is fixed size (23 bytes) for all heap tuples bits8 t_bits[1]; // bitmap of NULLs, variable length // OID goes here if HEAP_HASOID is set in t_infomask // actual data follows at end of struct } HeapTupleHeaderData;
❖ PostgreSQL Tuples (cont) |
Tuple-related data types: (cont)
typedef struct HeapTupleFields // simplified { TransactionId t_xmin; // inserting xact ID TransactionId t_xmax; // deleting or locking xact ID union { CommandId t_cid; // inserting or deleting command ID TransactionId t_xvac;// old-style VACUUM FULL xact ID } t_field3; } HeapTupleFields;
Note that not all system fields from stored tuple appear
oid
xmin
xmax
cmin
cmax
❖ Relational Operations |
DBMS core = relational engine, with implementations of
❖ Cost Analyses |
When showing the cost of operations, don't include Tr and Tw:
request_page()
release_page()
❖ Cost Analyses (cont) |
Two "dimensions of variation":
❖ Cost Analyses (cont) |
SQL vs DBMS engine
select ... from R where C
insert into R values(...)
delete from R where C
update R set ... where C
❖ Query Types |
SQL | RelAlg | a.k.a. | ||||
select * from R |
R | - | ||||
select from R |
Proj[x,y]R | - | ||||
select * from R order by |
Sort[x]R | ord | ||||
select * from R where id = |
Sel[id=k]R | one | ||||
select * from R where a = |
Sel[a=k]R | - | ||||
select * from R,S where R.id = S.r |
R Join[id=r] S | - |
❖ File Structures |
When describing file structures
❖ File Structures (cont) |
Consider three simple file structures:
Some records in each page may be marked as "deleted".
❖ Exercise: Operation Costs |
For each of the following file structures
❖ Scanning |
Consider the query:
select * from Rel;
Operational view:
for each page P in file of relation Rel { for each tuple t in page P { add tuple t to result set } }
Cost: read every data page once
Cost = b (b page reads + ≤b page writes)
❖ Scanning (cont) |
In this case, the implementation changes to:
for each page P in file of relation T { for each tuple t in page P { add tuple t to result set } for each overflow page V of page P { for each tuple t in page V { add tuple t to result set } } }
Cost: read each data and overflow page once
Cost = b + bOv
where bOv = total number of overflow pages
❖ Selection via Scanning |
Consider a one query like:
select * from Employee where id = 762288;
In an unordered file, search for matching tuple requires:
Guaranteed at most one answer; but could be in any page.
❖ Selection via Scanning (cont) |
Overview of scan process:
for each page P in relation Employee { for each tuple t in page P { if (t.id == 762288) return t } }
Cost analysis for one searching in unordered file
❖ Exercise: Cost of Search in Hashed File |
Consider the hashed file structure b = 10, c = 4, h(k) = k%10
Describe how the following queries
select * from R where k = 51; select * from R where k > 50;
might be solved in a file structure like the above (h(k) = k%b).
Estimate the minimum and maximum cost (as #pages read)
❖ Relation Copying |
Consider an SQL statement like:
create table T as (select * from S);
Effectively, copies data from one table to another.
Process:
s = start scan of S make empty relation T while (t = next_tuple(s)) { insert tuple t into relation T }
❖ Relation Copying (cont) |
Possible that T
S
S
T
❖ Relation Copying (cont) |
In terms of existing relation/page/tuple operations:
Relation in; // relation handle (incl. files) Relation out; // relation handle (incl. files) int ipid,opid,tid; // page and record indexes Record rec; // current record (tuple) Page ibuf,obuf; // input/output file buffers in = openRelation("S", READ); out = openRelation("T", NEW|WRITE); clear(obuf); opid = 0; for (ipid = 0; ipid < nPages(in); ipid++) { get_page(in, ipid, ibuf); for (tid = 0; tid < nTuples(ibuf); tid++) { rec = get_record(ibuf, tid); if (!hasSpace(obuf,rec)) { put_page(out, opid++, obuf); clear(obuf); } insert_record(obuf,rec); } } if (nTuples(obuf) > 0) put_page(out, opid, obuf);
❖ Exercise: Cost of Relation Copy |
Analyse cost for relation copying:
Give cost in terms of #pages read + #pages written
Produced: 6 May 2024