In my last post about Identity Resolution using Postgres, we saw how we can use Postgres Triggers and Functions to do Identity Resolution. But in practical scenarios the data about users profiles are stored in NoSQL databases such as Cassandra or DynamoDB because the profiles are schemaless (these profiles are created across multiple platforms and thus need not be consistent).

In this post, we are going to substitute triggers with aws lambda because the data is coming from a different aws service (dynamodb stream).

Along with membership queries, estimating the number of unique items are also one of the most common use cases in Big Data analytics

- Number of unique visitors on your app for the last 1 day, 1 week or 1 month.
- Number of unique users that signed up for your website.
- Number of unique products purchased through an e-commerce website in the last 1 week.
- Number of unique downloads of your app from the play store.
- Number of unique Twitter hashtags posted in the last 1 hour.
- Number of unique search queries, number of views on youtube videos, on your Quora…

In many CRM platforms, customers can connect multiple sources of data for different analytics and ML use cases such as Recommendations, Churn Prediction, Identity Resolution etc.

In this post we look at one of such use case which is Identity Resolution i.e. identifying which profile are the same. For e.g. users of a customer can come through multiple channels such as app, website, mobile browser, whatsapp, api etc. with different profiles.

Doing membership tests is one of the common use-cases required by many applications.

- In web crawling we want to know whether an URL has been crawled or not.
- In recommendations systems we want to know whether an item has already been recommended to a user or not.
- To prevent DDOS attacks and IP Spoofing we want to know whether an IP address has been blacklisted or not.
- In ad-words, we want to know whether there are any ads corresponding to the words and n-grams present in a query.
- In search we want to know whether a query term is indexed…

Collection of interesting and challenging Dynamic Programming problems and solutions (in Python).

**Problem 1:**

Given a string containing just the characters `'('`

and `')'`

, find the length of the longest valid (well-formed) parentheses substring. (https://leetcode.com/problems/longest-valid-parentheses/)

**Solution:**

Let dp[i] be the length of longest valid parenthesis ending at index i.

Note that dp[i] = 0 if s[i] = ‘(‘ because parenthesis cannot end with ‘(‘.

If s[i] = ‘)’ and s[i-1] = ‘(‘, then dp[i] = dp[i-2]+2, because valid parenthesis is of the form ‘A()’ where A is a valid parenthesis of length ≥ 0.

Else if s[i] = ‘)’ and…

Combinatorics problems are one of the most fascinating category of DS & Algo. Although the problems looks daunting at first but generally they have an elegant solution at the end.

But very few companies ask problems related to combinatorics mostly due to the fact that some of these problems could be very hard for a candidate to come up with a solution within 30 minutes if the candidates have not seen similar problems before. In this post I have tried to compile some of the ‘easier’ combinatorics problems.

The solution for most of these problems comes under the category of…

In this post we are going to solve problems related to one of my favourite data structures-Tries. This is probably going to be the longest post in this series as the variety of problems that can be solved using Tries are quite a lot. Apart from standard Trie implementation I will demonstrate Radix Tree, which is a compressed Trie (memory optimized Trie).

I will not get into explaining what a Trie data structure is, as that can be found on several websites. This post will mainly deal with some variety of problems that can be solved using Tries.

**Problem 1:**

…

In this post we are going to look at data structures for storing stream of numbers in a way such that the following operations can be efficiently performed (although the data structures we are going to visit here can do a lot more):

- Current minimum and maximum numbers.
- Median of all numbers seen so far.
- Smallest (Largest) number greater (smaller) than a given number.
- Count of all smaller (or larger) numbers given a number.
- Binary search on numbers seen so far.

For 1 and 2, we can use a Min-Heap and a Max-Heap. 3 and 4 are special cases of…

In the last post on binary search optimization, we found out that we can solve optimization problems using binary search under certain conditions. In this post we are going to look at another class of optimization problems that can again be solved using binary search technique. This class of problem is called the Longest Increasing Subsequence. I fondly call it the Russian Doll problem.

In its most simplest form, we are given an array A of size N. Each element of the array A is a real number. …

Binary Search is probably one of the most ‘interesting’ algorithm from our high school and sophomore college computer science course. But most of us have encountered binary search either in the context of searching in a sorted array or some tweaked versions of similar problem:

- Find an element in a rotated sorted array (with and without duplicates allowed).
- Find an element in the 2D sorted array, only rows sorted, only columns sorted and both row and column sorted versions.

While these problems are interesting enough to puzzle even the most well prepared interview candidates. But in this post I will…

ML Engineer @ Twilio