❖ Programming with Databases |
So far, we have seen ...
❖ Programming with Databases (cont) |
While SQL (+ PLpgSQL) gives a powerful data access mechanism
We need PL + DBMS connectivity.
❖ Programming with Databases (cont) |
Requirements of an interface between PL and RDBMS:
libpq
❖ PL/DB Interface |
Common DB access API used in programming languages
db = connect_to_dbms(DBname,User/Password);
query = build_SQL("SqlStatementTemplate",values);
results = execute_query(db,query);
while (more_tuples_in(results))
{
tuple = fetch_row_from(results);
// do something with values in tuple ...
}
This pattern is used in many different libraries:
❖ PL/DB Interface (cont) |
DB access libraries have similar overall structure.
But differ in the details:
❖ PL/DB Mismatch |
There is a tension between PLs and DBMSs
Note: relative costs of DB access operations from PL:
❖ PL/DB Mismatch (cont) |
Consider this (imaginary) PL/DBMS access method:
-- establish connection to DBMS db = dbAccess("DB"); query = "select a,b from R,S where ... "; -- invoke query and get handle to result set results = dbQuery(db, query); -- for each tuple in result set while (tuple = dbNext(results)) { -- process next tuple process(tuple['a'], tuple['b']); }
Estimated costs: dbAccess
dbQuery
dbNext
In later cost estimates, ignore dbAccess
❖ PL/DB Mismatch (cont) |
Example: find mature-age students (e.g. 10000 students, 500 over 40)
query = "select * from Student";
results = dbQuery(db, query);
while (tuple = dbNext(results)) {
if (tuple['age'] >= 40) {
-- process mature-age student
}
}
We transfer 10000 tuples from DB, 9500 are irrelevant
Cost = 1*200 + 10000*10 = 100200ms = 100s
❖ PL/DB Mismatch (cont) |
E.g. should be implemented as:
query = "select * from Student where age >= 40";
results = dbQuery(db, query);
while (tuple = dbNext(results)) {
-- process mature-age student
}
Transfers only the 500 tuples that are needed.
Cost = 1*200 + 500*10 = 5200ms = 5s
❖ PL/DB Mismatch (cont) |
Example: find info about all marks for all students
query1 = "select id,name from Student";
res1 = dbQuery(db, query1);
while (tuple1 = dbNext(res1)) {
query2 = "select course,mark from Marks"
+ " where student = " + tuple1['id'];
res2 = dbQuery(db,query2);
while (tuple2 = dbNext(res2)) {
-- process student/course/mark info
}
}
E.g. 10000 students, each with 8 marks, ⇒ run 10001 queries
Cost = 10001*200 + 80000*10 = 2800s = 46min
❖ PL/DB Mismatch (cont) |
E.g. should be implemented as:
query = "select id,name,course,mark"
+ " from Student s join Marks m "
+ " on (s.id=m.student)"
results = dbQuery(db, query);
while (tuple = dbNext(results)) {
-- process student/course/mark info
}
We invoke 1 query, and transfer same number of tuples.
Cost = 1*200ms + 80000*10ms = 800s = 13min
Produced: 25 Oct 2020