Machine Learning Coding Problems

Abhijit Mondal
10 min readDec 25, 2021

--

source: quanta magazine

In many ML interview rounds, candidates are asked to demonstrate their coding skills w.r.t. some of the common and basic ML problems. These kind of rounds helps to identify both coding skills as well as ML skills required to be a top notch ML engineer in some of the top companies.

Let’s look at some of the common ML coding problems asked in such interviews:

Problem 1:

Given the API rand7() that generates a uniform random integer in the range [1, 7], write a function rand10() that generates a uniform random integer in the range [1, 10]. You can only call the API rand7(), and you shouldn't call any other API. Please do not use a language's built-in random API.

Solution:

rand7()-1 generates random integer between 0 and 6.

To generate random integers between [1, 10], we need to generate 10 numbers which are equally probable or a range which is equally divisible into 10 parts.

Let a=rand7()-1 and b= rand7(), then 7a+b will generate random integers between [1, 49] each with equal probability of 1/49.

If the value of 7a+b is in the range [1,40], then we can use these range to equally divide into 10 parts each of size 4 as below:

[1,4] [5,8] [9,12] [13,16] [17,20] [21,24] [25,28] [29,32] [33,36] [37,40]

Thus if the value of 7a+b lies between [1,4], we return 1, if the value lies between [5,8] return 2 and so on.

Thus return int((7a+b-1)/4)+1 if 1 ≤ 7a+b ≤ 40.

Probability that the while loop returns on each call is 40/49 because out of 49 possibilities for 7a+b, only 40 are valid. Thus the expected number of times the while loop executes for each call is 49/40.

For an arbitrary range [a, b] using rand7(), first normalize the range to [1, b-a+1], then find the smallest P such that b-a+1 ≤ 7^P. Then find the largest K such that K*(b-a+1) ≤ 7^P.

Let Q = 7^(P-1)*(rand7()-1) + 7^(P-2)*(rand7()-1) + … + 7*(rand7()-1) + rand7()

Q generates integers in the range [1, 7^P].

Consider only the range [1, K*(b-a+1)] from the values of Q. Among this range, if we divide it into b-a+1 equal parts each of size K, then for any integer in the 1st part i.e. [1, K] we return 1, for 2nd part [K+1, 2K] return 2 and so on.

Thus int((Q-1)/K)+1 will return random integer in the range [1, b-a+1]. To return in the range [a, b], add a-1 to the output i.e. a+int((Q-1)/K).

Expected number of times while loop executes is 7^P/K*(b-a+1).

Problem 2:

You are given a 0-indexed array of positive integers w where w[i]describes the weight of the ith index.

You need to implement the function pickIndex(), which randomly picks an index in the range [0, w.length - 1] (inclusive) and returns it. The probability of picking an index i is w[i] / sum(w).

Solution:

Store the cumulative sums of weights till index i for each index i in the array. Generate a random integer M between [1, sum(w)].

Find the smallest index j, such that cum_sum[j] ≥ M. Return j.

Since the cumulative sums array is a sorted array, we can do binary search to find the index j s.t. M ≤ cum_sum[j].

Time complexity is O(logN) for each call to pickIndex().

This problem have very wide applications such as:

  • Randomly selecting a server by load balancer based on the negative of current load as weights.
  • Randomly sampling data from a discrete probability distribution.
  • Generating artificial data for simulating traffic.

Problem 3:

You are given an integer n and an array of unique integers blacklist. Design an algorithm to pick a random integer in the range [0, n - 1]that is not in blacklist. Any integer that is in the mentioned range and not in blacklist should be equally likely to be returned.

Optimize your algorithm such that it minimizes the number of calls to the built-in random function of your language.

Solution:

Sort the integers in the blacklist.

Then generate intervals of the form [blacklist[i-1]+1, blacklist[i]-1], i.e. divide the entire range of integers such that it excludes the blacklisted integers

For example, if the range of integers is [1, 100] and sorted blacklisted integers are [10, 25, 40, 90], then the generated ranges are [1,9] [11,24] [26,39], [41,89] [91,100].

Then based on the size of each range, pick a range with probability = size of range/sum of sizes of all ranges.

This can be done using the algorithm provided in problem 2.

After picking a range, randomly pick an integer within that range.

Probability of picking a number = Probability of picking a range * Probability of picking a number in that range.

If size of range is S, then probability = S/sum(size of ranges)*1/S = 1/sum(size of ranges)

Since sum(size of ranges) = N-len(blacklisted), thus each number not in blacklist is picked uniformly.

Problem 4:

Given a method biased_toss() that generates 0 or 1. It generates 1 with some probability p (not necessarily 0.5). Write a method that generates 0 or 1 with probability 0.5.

Solution:

If we call biased_toss() method twice, then the possibilities are 00, 01, 10, 11. The probabilities of 00 is (1-p)², 01 is (1-p)*p, 10 is p*(1-p) and 11 is p².

Since the probabilities of 01 and 10 are equal i.e. p*(1-p), we can return 0 if we obtain 01 and 1 if we obtain 10.

Expected number of times while loop executes is 1/p*(1-p).

Problem 5:

Given an infinite stream of numbers. Write a method that would return K random samples from the stream without replacement.

Solution:

This can be solved using reservoir sampling.

For each number in the stream, we track its index.

If the size of the sample array is less than K, then append the number at the end of the array.

If the size of the sample array is more than equal to K, then generate a random integer between 0 and current index of the number in the stream. If the random integer is between 0 and K-1, then replace the index in the array corresponding to the random integer with the current number.

We have to prove that the probability of each number in the stream, to be in the sample is equal to K/N where K is the length of samples array and N is the length of stream.

Assuming the length of stream is N.

For numbers with index < K, it is part of the sample if it is not replaced by any number with index ≥ K.

Probability that number with index J < K is part of sample = Probability that random integer generated at index K is not equal to J and Probability that random integer generated at index K+1 is not equal to J and … Probability that random integer generated at index N-1 is not equal to J.

Probability that random integer generated at index K is not equal to J = K/K+1

Probability that random integer generated at index K+1 is not equal to J = K+1/K+2

Probability that random integer generated at index N-1 is not equal to J = N-1/N

Thus

Probability that number with index J < K is part of sample = K/(K+1)*(K+1)/(K+2)*…*(N-1)/N=K/N.

For numbers with index J ≥ K, it is part of the sample if the random integer X generated between 0 and J is < K and the random integers generated at index > J is not equal to X.

Probability that number with index J ≥ K is part of sample = Probability that random integer X generated between 0 and J is < K and Probability that random integer generated at index J+1 is not equal to X and Probability that random integer generated at index J+2 is not equal to X and … Probability that random integer generated at index N-1 is not equal to X.

Probability that random integer X generated between 0 and J is < K = K/(J+1)

Probability that random integer generated at index J+1 is not equal to X = J+1/(J+2)

Probability that random integer generated at index J+2 is not equal to X = J+2/(J+3)

Probability that random integer generated at index N-1 is not equal to X = N-1/N

Thus

Probability that number with index J ≥ K is part of sample = K/(J+1)*(J+1)/(J+2)*(J+2)/(J+3)*…*(N-1)/N=K/N.

Problem 6:

Randomly shuffle an array.

Solution:

Randomly sample an index j from i to Len(arr)-1. Then swap arr[j] with arr[i] and increment i by 1.

Problem 7:

Flip a fair coin repeatedly until you get two heads in a row (HH). What is the probability of getting HH in at-most N (N > 1) tosses ? Can you write a recursive function and implementation to calculate the probability ?

Solution:

Let F(n, H) be the number of sequences of length n ending with HH and beginning with H and F(n, T) be the number of sequences of length n ending with HH and beginning with T.

Thus F(n+1, T) = F(n, H) + F(n, T) because if the sequence begins with T then the remaining sequence can begin with either H or T.

F(n+1, H) = F(n, T) because if the sequence begins with H then the remaining sequence can begin only with T (HH is forbidden).

Let G(n+1) = F(n+1, H) + F(n+1, T) = F(n, T) + F(n, H) + F(n, T) = F(n-1, H) + F(n-1, T) + F(n, H) + F(n, T)

G(n+1) = G(n) + G(n-1)

G(1) = 0, G(2) = 1

Probability P(n) = (1/2²)*G(2) + (1/2³)*G(3) + (1/2⁴)*G(4) + …. + (1/2^n)*G(n)

Problem 8:

Compute nCr where n ≥ r and 1 ≤ n ≤ 1000, 1 ≤ r ≤ n.

Solution:

nCr = n-1Cr + n-1Cr-1

Thus we will use dynamic programming to compute the values:

Time and space complexity is O(N²).

Problem 9:

Implement Stratified sampling.

Given training data, class labels and the fraction of validation data per class ‘test_size’. Write a method to sample train data, test data, train labels and test labels such that each class has ‘test_size’ fraction of data/labels for validation and 1-test_size fraction of data/labels for training.

Solution:

One of the solution is the first group by based on class labels, then for each group call the sampling method to get the validation samples and then use the remaining as training samples.

Since the size of the data is fixed here, instead of using reservoir sampling as above (for streaming data), we will use another more cleaner method:

Generate a uniform random number between 0 and 1. If it is ≤ test_size, then add the current data and label in validation set else add it in the training set.

But do we really need to first group by the class labels ?

If we iterate over all data, then what is the probability of me selecting a point for adding in the validation set s.t. the probability per class is test_size.

Let P(y=1) be the probability that a data /label is selected and P(Ci) be the probability of class Ci or fraction of class Ci in the data.

Then P(y=1) = P(y=1|C1)*P(C1) + P(y=1|C2)*P(C2) + … + P(y=1|Cm)*P(Cm) where there are ‘m’ classes.

We know that P(y=1|C1) = P(y=1|C2) = …. = test_size i.e. the probability of selecting a data/label for validation given the class label is test_size.

Thus P(y=1) = test_size*(P(C1) + P(C2) + … + P(Cm)) = test_size, since P(C1) + P(C2) + … + P(Cm) = 1

Thus instead of grouping by class label, we can iterate over all data and select a data/label if uniform(0,1) ≤ test_size.

It may not be intuitive at first since it might appear that we are sampling test_size fraction over all data instead of per class, but observe that the distribution of the classes is already handled when I am looping over all data. Thus a class with more samples will have more chance.

Problem 10:

You are given the true labels and the predicted probabilities from logistic regression model for N test examples. Approximately compute the AUC scores for ROC and PR curves.

Solution:

For ROC we need the TPR and FPR values.

TPR = TP/(Number of all data with true label=1)

FPR = FP/(Number of all data with true label=0)

For each probability threshold, find the TPR and FPR values.

Given a threshold T, TP = number of data with probability ≥ T having true label = 1 and FP = number of data with probability ≥ T having true label = 0. Also when T decreases both TPR and FPR increases and vice versa.

We can solve this efficiently by sorting the data on probability.

After sorting on probability, starting from the end, if at index i, true label = 1, then we increment TP by 1 else we increment FP by 1, because we are assuming the threshold to be probability[i].

To compute the AUC, we make use of the trapezoid formula: 0.5*width*(height1+height2) where width is the difference is FPR between 2 consecutive thresholds and height1 is the TPR at threshold T(i) and height2 is TPR at threshold T(i+1).

Similarly we can compute the AUC for PR curve.

Precision = TP/(Number of data predicted to be positive i.e. count of points with probability ≥ threshold).

Recall = TP/(Number of all data with true label=1).

Problem 11:

Implement a basic binary logistic regression classifier. It should perform the following operations:

  • Train model
  • Predict on testing data

Assume the training and testing data is in the form of 2D matrix and the class labels is a 1-D array.

Solution:

Important methods to consider:

  • Computing sigmoid scores
  • Computing the logistic loss
  • Gradient descent (batch or stochastic)

Problem 12:

Implement a simple decision tree classifier. It should perform the following operations:

  • Train model
  • Predict on testing data

Assume the training and testing data is in the form of 2D matrix and the class labels is a 1-D array.

Solution:

Important methods to consider:

  • Computing information gain
  • Splitting a node on a feature and feature value.

--

--

Responses (1)