❖ Directed Graphs (Digraphs) |
Reminder: directed graphs are ...
❖ Digraph Applications |
Potential application areas:
Domain | Vertex | Edge | ||
Web | web page | hyperlink | ||
scheduling | task | precedence | ||
chess | board position | legal move | ||
science | journal article | citation | ||
dynamic data | malloc'd object | pointer | ||
programs | function | function call | ||
make | file | dependency |
❖ ... Digraph Applications |
Problems to solve on digraphs:
❖ Transitive Closure |
Problem: computing reachability (reachable(G,s,t)
Given a digraph G it is potentially useful to know
❖ ... Transitive Closure |
One possibility to implement a reachability check:
hasPath(G,s,t)
Would be very convenient/efficient to have:
reachable(G,s,t) ≡ G.tc[s][t]
tc[][]
tc[
][
]
Of course, if V is large, then this may not be feasible either.
❖ ... Transitive Closure |
The tc[][]
Question: how to build tc[][]
edges[][]
❖ ... Transitive Closure |
Goal: produce a matrix of reachability values
Observations:
tc[s][t]=1
tc[s][t]=1
❖ ... Transitive Closure |
Extending the above observations gives ...
An algorithm to convert edges
makeTC(G): | tc[][] = edges[][] | for all i ∈ vertices(G) do | | for all s ∈ vertices(G) do | | | for all t ∈ vertices(G) do | | | | if tc[s][i]=1 ∧ tc[i][t]=1 then | | | | | tc[s][t]=1 | | | | end if | | | end for | | end for | end for
This is known as Warshall's algorithm
❖ ... Transitive Closure |
How it works …
After copying edges[][]
tc[s][t]
After first iteration (i=0
tc[s][t]
i=1
tc[s][t]
tc[s][t]
❖ ... Transitive Closure |
Cost analysis:
reachable()
tc[][]
reachable()
Alternative: use DFS in each call to reachable()
Cost analysis:
reachable()
❖ Digraph Traversal |
Same algorithms as for undirected graphs:
depthFirst(G,v): | mark v as visited | for each (v,w) ∈ edges(G) do | | if w has not been visited then | | | depthFirst(w) | | end if | end for breadthFirst(G,v): | enqueue v | while queue not empty do | | curr=dequeue | | if curr not already visited then | | | mark curr as visited | | | enqueue each w where (curr,w) ∈ edges(G) | | end if | end while
❖ Example: Web Crawling |
Goal: visit every page on the web
Solution: breadth-first search with "implicit" graph
webCrawl(startingURL): | mark startingURL as alreadySeen | enqueue(Q,startingURL) | while not isEmpty(Q) do | | currPage=dequeue(Q) | | visit currPage | | for each hyperLink on currPage do | | | if hyperLink not alreadySeen then | | | mark hyperLink as alreadySeen | | | enqueue(Q,hyperLink) | | | end if | | end for | end while
visit
❖ PageRank |
Goal: determine which Web pages are "important"
Approach: ignore page contents; focus on hyperlinks
❖ ... PageRank |
Assume for the moment that we could build a graph …
Naive PageRank algorithm:
PageRank(myPage): | rank=0 | for each page in the Web do | | if linkExists(page,myPage) then | | rank=rank+1 | | end if | end for
Note: requires inbound link check (normally, we check outbound)
❖ ... PageRank |
V = # pages in Web, E = # hyperlinks in Web
Costs for computing PageRank for each representation:
Representation | linkExists(v,w) | Cost | ||
Adjacency matrix | edge[v][w] | 1 | ||
Adjacency lists | inLL(list[v],w) | ≅ E/V |
Not feasible ...
❖ ... PageRank |
The random web surfer strategy.
Each page typically has many outbound hyperlinks ...
visited[]
Important property of this strategy
❖ ... PageRank |
Random web surfer algorithm ...
curr=random page, prev=null for a long time do | if curr not in array rank[] then | rank[curr]=0 | end if | rank[curr]=rank[curr]+1 | if random(0,100) < 85 then // with 85% chance ... | prev=curr // ... keep crawling | curr=choose hyperlink from curr | else | curr=random page, not prev // avoid getting stuck | prev=null | end if end for
❖ ... PageRank |
Above is a very rough approximation to reality.
If you want the details ...
Produced: 16 Jul 2020