Coding Algorithms — Graphs and Networks

Abhijit Mondal
9 min readMar 29, 2022

--

source: quanta magazine

Problem 1:

You are given a friendship (undirected connected) Facebook graph for a social network. Remove as many edges as possible such that the graph is still connected.

The input is given in the form of a list of bidirectional edge pairs.

Example:

Input: [('A','B'),('A','C'),('B','C')]Output: [('A','B'),('A','C')] or [('A','B'),('B','C')] or [('A','C'),('B','C')]
Removed one edge.

Solution:

The maximum number of edges that we can remove is E-V+1, where E is the number of edges and V is number of vertices.

Because the minimum number of edges required for the graph to remain connected i.e. a spanning tree is V-1. Thus we need to find a spanning tree of the graph.

One simple approach is find BFS Tree of the graph:

Time complexity is O(V+E) and space complexity is O(E).

Problem 2:

Similar to the above problem but now instead of an undirected friendship graph, we have a directed graph of followers on Instagram.

Remove as many edges as possible such that if there was a path from X to Y in the original graph, there should be a path from X to Y in the modified graph.

The input is given in the form of a list of edge pairs (X, Y) where X follows Y.

Example:

Input: [['a', 'b'], ['a', 'c'], ['a', 'd'], ['a', 'e'], ['b', 'd'], ['c', 'd'], ['c', 'e'], ['d', 'e']]Output: [('a', 'b'), ('a', 'c'), ('b', 'd'), ('c', 'd'), ('d', 'e')]

Solution:

This is an example of transitive reduction of a graph.

Starting from each node ‘u’ of the graph, do a DFS and if we visit a node ‘w’ (where w is at a depth of > 1 in the DFS tree) during DFS such that there is an edge (u, w) then remove the edge (u, w).

Time complexity is O(V*(V+E)). Space complexity is O(E).

Problem 3:

You are given a graph of connections for a website like LinkedIn. Find all users in this graph which connects two different communities i.e. if we remove these users then the graph will be split into multiple connected components.

For e.g. users who have worked in multiple companies, can connect the graph of each company thus making people in one company visible to the other.

Example:

Input: [(0,1), (0,5), (1,0), (1,2), (1,3), (2,1), (2,3), (2,4), (3,1), (3,2), (3,4), (4,2), (4,3), (5,0)]Output: [0,1]
Nodes 0 and 1 are articulation points

Solution:

We need to find the ariculation points in the LinkedIn graph.

A node U is an articulation point if there is at-least one descendant node in the DFS Tree, such that there is no edge (back-edge) from the descendant node to any ancestor node of U.

If there was an edge from all descendant nodes to any ancestor node of U, then even if we deleted U, the graph is still connected through this back-edges which were not taken during the DFS traversal.

How do we track this ?

For each node X let us denote tin[X] as the time/depth at which X was first discovered during DFS traversal and min_tin[X] is the minimum of tin[Y] in the DFS tree rooted at X for all descendant nodes Y of X.

Note that Y could also be an ancestor of X in cases where Z is a descendant of X but Z has a back edge to Y, then although Z is a descendant of X in DFS Tree but it could also have been ancestor if we had taken a different path during DFS traversal i.e. Y->Z->X instead of Y->X->Z.

If min_tin[X] ≥ tin[X] i.e. all descendant nodes can only be discovered after discovering X (by taking any path during DFS), then X is an articulation point because if there was a back edge, then min_tin[X] would be lower than tin[X] as for a node Y which is ancestor of X , tin[Y] < tin[X] thus min_tin[X] = tin[Y] < tin[X].

Also if a node is a root node and has more than one child then it is also an articulation point.

Time complexity is O(V+E) and space complexity is O(V).

Problem 4:

Someone has given to you an undirected graph of user-ad clicks. But they have missed out indicating which nodes are user nodes and which nodes are ad nodes. How would you verify whether this graph is a valid ad-click graph or not ?

Solution:

Check whether it is a bipartite graph or not.

In a bipartite graph we would have two sets of nodes: users and ads and there will be edges only across these 2 sets and not within these 2 sets i.e. an edge can exist between a user node and an ad node but not between 2 user nodes or 2 ad nodes.

There are multiple ways to solve this problem:

Approach 1:

Do a DFS/BFS traversal over all the connected components. If there is an edge between X and Y and they have not yet been assigned, then assign X to set 1 and Y to set 2.

If X is assigned and Y is not, then assign Y to set 2 if X is set 1 else assign Y to set 1 if X is set 2.

Similarly, if Y is assigned and X is not, then assign X to set 2 if Y is set 1 else assign X to set 1 if Y is set 2.

Else if both are assigned and if X and Y belong to different sets i.e. 1 and 2 or 2 and 1 respectively then its ok else if they belong to same set then it is not a valid graph because there cannot be an edge from user to user node or ad to ad node.

Time complexity is O(V+E). Space complexity is O(V).

Approach 2:

If there is a cycle of odd length then it is not a bipartite graph.

Time complexity is O(V+E). Space complexity is O(V).

Problem 5:

Given a road network, where nodes represents landmarks and edges represents roads between landmarks and each edge is weighted by the average amount of time it takes to traverse the road from landmark A to landmark B.

Find the fastest path between each pair of landmark.

Input is given in the form of adjacency list with weights.

Example:

Input: [[(1,3),(3,5)],[(0,2),(3,4)],[(1,1)],[(2,2)]]
Output: [[0, 3, 7, 5], [2, 0, 6, 4], [3, 1, 0, 5], [5, 3, 2, 0]]

Solution:

One approach is to run Djikstra’s algorithm from each landmark X to find the shortest path to all landmarks starting from X.

Let time[x][y] denote the fastest time to travel from landmark x to landmark y. In Djikstra’s algorithm we greedily pick the current fastest landmark Z reachable from ‘start’ landmark. Then for all neighbors W of Z, we update:

time[start][W] = min(time[start][W], time[start][Z]+edge weight Z->W).

Time complexity is O(V*(V+E)*logV). Space complexity is O(V²).

If E=O(V²) then time complexity is O(V³logV).

Another approach is to use Floyd Warshall Algorithm:

Time complexity is O(V³). Space complexity is O(V²).

Problem 6:

In the above road network problem, find the number of shortest paths between any 2 given landmarks.

Solution:

We use Djikstra’s algorithm to find the shortest path between src and dst.

In the Djikstra algorithm apart from time[x] i.e. the minimum time it takes to travel from src to x, we also maintain num_paths[x] i.e. number of paths with time=time[x].

Whenever we for an edge w->x that time[w]+weight(w,x) < time[x], we update time[x]=time[w]+weight(w,x) and set num_paths[x]=num_paths[w] because we can reach x through the same paths as w.

But if there is another path v->x with time[w]+weight(w,x) = time[x], then we know that apart from w, we can also reach x through v, thus num_paths[x] = num_paths[x] + num_paths[v].

Time complexity is O((V+E)*logV). Space complexity is O(V).

Problem 7:

In the above road network problem, the roads needs to be paved. But it is very inefficient and expensive to pave all roads between all pairs of landmarks. A cost effective way would be to pave the roads such that any body can travel from one landmark to another landmark through a network of paved roads.

Given that costs are proportional to the length of the roads (or distance between landmarks) and the distance between 2 landmarks is given by the Manhattan Distance.

Manhattan distance between points (xi, yi) and (xj, yj) is given as |xi - xj| + |yi - yj|.

Find the minimum cost to pave the roads such that any body can travel from one landmark to another landmark through a network of paved roads.

Example:

Input: landmarks = [[0,0],[2,2],[3,10],[5,2],[7,0]]
Output: 20
Explanation:
We can connect the landmarks as shown above to get the minimum cost of 20.
Notice that there is a unique path between every pair of landmarks.

Solution:

We can solve this problem by finding the Minimum Spanning Tree of the road network using Kruskal’s algorithm

Store the edges in a Min-Heap with the edge weight as the key.

One by one extract the root of the heap with an edge (X, Y). Each time assign the 2 nodes to a connected component.

If both X and Y are not assigned before to any component, then create a new component with id M+1, where M is the number of existing components. Assign both X and Y to M+1. Add this edge to the set of edges.

But if X is already assigned and Y is not assigned, then assign Y to the same component as X. Add this edge to the set of edges.

Similarly is Y is assigned but X is not assigned, then assign X to the same component as Y. Add this edge to the set of edges.

But if both are already assigned, then if they are part of same component it means addition of this edge will create a cycle hence we do not add this edge.

But if they are part of different components, then we add the edge and merge the 2 components s.t. all nodes in the higher numbered component are moved to the lower numbered component and and the higher numered component is deleted.

Time complexity is O(E*logE) which is same as O(E*logV) because E=O(V²). Space complexity is O(V).

Problem 8:

For an ecommerce website, we have built a click graph of product pages. Each node represents a product page with a product ID. There is a directed edge from product A to product B, if any user has visited B directly from A. The edge weight is the number of times B has been visited after A.

We have received a click sequence ‘target’ of length N of product page IDs.

Find the most similar sequence of product IDs of length N from the graph to the target.

Similarity between 2 sequences X and Y is given by the edit distance. If edit distance is small then sequences are more similar:

sum(x[i] != y[i] for i in range(N))

If there are multiple sequences with same edit distance then pick one with the maximum sum of edge weights.

Solution:

Let dp[i][j] be the minimum edit distance of a sequence of length ‘i’ ending with product ID ‘j’.

If target[i] != j: 
dp[i][j] = min(dp[i][j], dp[i-1][k]+1)
else:
dp[i][j] = min(dp[i][j], dp[i-1][k])
For all k which are parent of node j

Let G[i][j] = max(G[i][j], G[i-1][k]+edge(k, j)) be the maximum sum of edge weights for all k which gives the minimum dp[i][j] score from above. Note that there could be multiple k’s (parent of j) which gives the smallest dp[i][j].

Let path[i][j] = k, for which dp[i][j] is minimum and G[i][j] is maximum. i.e. we will have some k’s (parent of j) which will give smallest dp[i][j]. With these k’s find G[i][j]. Whichever among these, gives the highest G[i][j] select any one as path[i][j].

Time complexity is O(NK²) where N is the length of target sequence and K is the number of product IDs. Space complexity is O(NK).

--

--