Directed Graphs

COMP2521 20T2 ♢ Directed Graphs [0/19]
❖ Directed Graphs (Digraphs)

Reminder: directed graphs are ...

Example digraph:

[Diagram:Pics/digraph0.png]

COMP2521 20T2 ♢ Directed Graphs [1/19]
❖ Digraph Applications

Potential application areas:

Domain Vertex Edge
Web web page hyperlink
scheduling task precedence
chess board position legal move
science journal article citation
dynamic data malloc'd object pointer
programs function function call
make file dependency
COMP2521 20T2 ♢ Directed Graphs [2/19]
❖ ... Digraph Applications


Problems to solve on digraphs:

COMP2521 20T2 ♢ Directed Graphs [3/19]
❖ Transitive Closure


Problem: computing reachability    (reachable(G,s,t))

Given a digraph G  it is potentially useful to know

Example applications:
COMP2521 20T2 ♢ Directed Graphs [4/19]
❖ ... Transitive Closure

One possibility to implement a reachability check:

What about applications that frequently check reachability?

Would be very convenient/efficient to have:

reachable(G,s,t)  ≡  G.tc[s][t]

tc[][] is called the transitive closure matrix

Of course, if V  is large, then this may not be feasible either.

COMP2521 20T2 ♢ Directed Graphs [5/19]
❖ ... Transitive Closure


The tc[][] matrix shows all directed paths in the graph

[Diagram:Pics/tc.png]


Question: how to build tc[][] from edges[][]?

COMP2521 20T2 ♢ Directed Graphs [6/19]
❖ ... Transitive Closure


Goal: produce a matrix of reachability values

Observations:

In other words
COMP2521 20T2 ♢ Directed Graphs [7/19]
❖ ... Transitive Closure

Extending the above observations gives ...

An algorithm to convert edges into a tc

makeTC(G):
|  tc[][] = edges[][]
|  for all i ∈ vertices(G) do
|  |  for all s ∈ vertices(G) do
|  |  |  for all t ∈ vertices(G) do
|  |  |  |  if tc[s][i]=1 ∧ tc[i][t]=1 then
|  |  |  |  |  tc[s][t]=1
|  |  |  |  end if
|  |  |  end for
|  |  end for
|  end for

This is known as Warshall's algorithm

COMP2521 20T2 ♢ Directed Graphs [8/19]
❖ ... Transitive Closure

How it works …

After copying edges[][],  tc[s][t] is 1 if s→t  exists

After first iteration (i=0),  tc[s][t] is 1 if

After second interation (i=1), tc[s][t] is 1 if any of After the Vth iteration, tc[s][t] is 1 if
COMP2521 20T2 ♢ Directed Graphs [9/19]
❖ ... Transitive Closure

Tracing Warshall's algorithm on a simple graph:

[Diagram:Pics/tc-example.png]

COMP2521 20T2 ♢ Directed Graphs [10/19]
❖ ... Transitive Closure

Cost analysis:

Amortisation: need many calls to reachable() to justify setup cost

Alternative: use DFS in each call to reachable()

Cost analysis:

COMP2521 20T2 ♢ Directed Graphs [11/19]
❖ Digraph Traversal

Same algorithms as for undirected graphs:

depthFirst(G,v):
|  mark v as visited
|  for each (v,w)edges(G) do
|  |  if w has not been visited then
|  |  |  depthFirst(w)
|  |  end if
|  end for

breadthFirst(G,v):
|  enqueue v
|  while queue not empty do
|  |  curr=dequeue
|  |  if curr not already visited then
|  |  |  mark curr as visited
|  |  |  enqueue each w where (curr,w)edges(G)
|  |  end if
|  end while

COMP2521 20T2 ♢ Directed Graphs [12/19]
❖ Example: Web Crawling

Goal: visit every page on the web
Solution: breadth-first search with "implicit" graph

webCrawl(startingURL):
|  mark startingURL as alreadySeen
|  enqueue(Q,startingURL)
|  while not isEmpty(Q) do
|  |  currPage=dequeue(Q)
|  |  visit currPage
|  |  for each hyperLink on currPage do
|  |  |  if hyperLink not alreadySeen then
|  |  |     mark hyperLink as alreadySeen
|  |  |     enqueue(Q,hyperLink)
|  |  |  end if
|  |  end for
|  end while

visit scans page and collects e.g. keywords and links

COMP2521 20T2 ♢ Directed Graphs [13/19]
❖ PageRank


Goal: determine which Web pages are "important"

Approach: ignore page contents; focus on hyperlinks

Problem: the Web is a very large graph
COMP2521 20T2 ♢ Directed Graphs [14/19]
❖ ... PageRank


Assume for the moment that we could build a graph …

Naive PageRank algorithm:

PageRank(myPage):
|  rank=0
|  for each page in the Web do
|  |  if linkExists(page,myPage) then
|  |     rank=rank+1
|  |  end if
|  end for


Note: requires inbound link check   (normally, we check outbound)

COMP2521 20T2 ♢ Directed Graphs [15/19]
❖ ... PageRank

V = # pages in Web,   E = # hyperlinks in Web

Costs for computing PageRank for each representation:

Representation  linkExists(v,w)  Cost
Adjacency matrix  edge[v][w]  1
Adjacency lists  inLL(list[v],w)  ≅ E/V

Not feasible ...

So how to really do it?
COMP2521 20T2 ♢ Directed Graphs [16/19]
❖ ... PageRank


The random web surfer strategy.

Each page typically has many outbound hyperlinks ...

If no visited check, need a way to (mostly) avoid loops

Important property of this strategy

COMP2521 20T2 ♢ Directed Graphs [17/19]
❖ ... PageRank

Random web surfer algorithm ...

curr=random page, prev=null
for a long time do
|  if curr not in array rank[] then
|     rank[curr]=0
|  end if
|  rank[curr]=rank[curr]+1
|  if random(0,100) < 85 then // with 85% chance ...
|     prev=curr               // ... keep crawling
|     curr=choose hyperlink from curr
|  else
|     curr=random page, not prev // avoid getting stuck
|     prev=null
|  end if
end for

COMP2521 20T2 ♢ Directed Graphs [18/19]
❖ ... PageRank


Above is a very rough approximation to reality.

If you want the details ...

And the background ...
COMP2521 20T2 ♢ Directed Graphs [19/19]


Produced: 16 Jul 2020