Recently in math Category

Choosing random elements from a set of items is a problem that appears regularly. If you are already given the array of items, this can be implemented trivially:

import random
def random_element( a, N ):
    return a[ int( random.random() * N ) ]

Pick a number between 0 to N - 1, and return the element with that index.

Before we go on, here are “Some Minor Disclaimers”:

I’m aware that Python’s random module has some of these implemented. You are more than encouraged to use those, as the first thing you should do is if someone else did it already! The N can easily be substituted with len(a). It is there to make the parameter explicit. This article is meant for teaching and the coding style aims to make the code as clear (and language-neutral) as possible, and intentionally does not take advantage of a certain Python conciseness!

The Perl Cookbook has an optimization that is subtle, but amazingly concise:

srand;
rand($.) < 1 && ($line = $_) while <>;
# $line is the random line

Rewritten in Python, with an iterator instead of a file:

import random
def random_element_iterator( iterator ):
    N = 0

    for item in iterator:
        N += 1
        # 1/N chance
        if random.random() * N < 1:
            element = item

    return element

For the N-th element of the iterator, there is a 1/N probability of keeping that line. When I first came upon it, it worked like magic. It doesn’t have to stay that way — you can prove this with induction.

The base case N = 1 is true: the first element has 100% chance of being picked. The second element has 50% chance of being picked (and in this case, the first element is kept with 50% probability as well).

Suppose for the first N items, the probability of choosing any of the N elements is equal — 1/N. When you process the item N+1, the probability of it being chosen is 1/(N+1), which is by definition above. The probability of choosing any previous item was 1/N, and with N/(N+1) probability of staying that way - (1/N) * (N /(N+1)) = 1/(N+1). Each of the N+1 items now has 1/(N+1) of being chosen.

This was my first exposure to an online algorithm — you don’t need to know how many elements we need to account for. This is particularly practical, as in the Perl Cookbook example, it means you do not have to keep all the data in the memory to retrieve a random item — you can only need enough memory to store one item.

It turns out that this was covered in Knuth’s “The Art of Computer Programming”, which I read a few years later:

Can we pick a random element from a set of items whose cardinality we do not know?


Let’s revisit the naive algorithm. It’s as simple as returning one element based on a randomly generated index.

Now let’s take a look at the Fisher-Yates algorithm:

import random
def shuffle( a, N ):
    for i in range( N - 1, 0, -1 ):
        j = int( random.random() * i )
        a[i], a[j] = a[j], a[i]
    return a

This is the canonical way to create an unbiased shuffled array, meaning each of the possible permutation is equally likely. This is O( N ) time, which is as good as it gets, with O( N ) space. On every step of the iteration, you’re getting one random item from the remaining items. Did you make the connection? The naive algorithm can be viewed as one step of this iteration, returning the first element.

Disclaimer: It is very easy to implement Fisher-Yates incorrectly, including Sattolo’s algorithm. Double check your implementation!


Now we’re at the general problem - picking K distinct elements randomly out of a set of N items. The keyword here is distinct — if you want to choose the same item multiple times, just run the above algorithm K times. That is essentially picking K elements out of a set of N elements, with replacement. What we want is an algorithm for picking K elements out of a set of N elements, without replacement.

A simple, naive solution for choosing distinct elements is to keep track of what elements we’ve already picked:

# Assumes K <= N, or this will never terminate
import random
def random_subset_track( a, N, K ):
    used = [ False ] * N
    result = []

    for i in range( 0, K ):
        # get a random element between 0 and N.
        j = int( random.random() * N )
        while ( used[j] ):
            j = int( random.random() * N )
        result.append( a[j] )
        used[j] = True

    return result

This might scream inefficient to you, as it should. Linear in the base case, it probably will take longer. A better solution, with some foreshadowing, is to simply use the Fisher-Yates shuffle! Get one of the N! permutations, and take the first K elements:

import random
# Use the slice a[:K]
def shuffle( a, N ):
    for i in range( N - 1, 0, -1 ):
        j = int( random.random() * i )
        a[i], a[j] = a[j], a[i]
    return a

Can we do better? Yes! Wait, didn’t I say this was O( N ), and thus optimal? How can you do better? Well, it turns out it is O( N ), and thus optimal in terms of N, but you can do better if you analyze its running time in terms of K, its output — for this problem, K can be much smaller than N, and thus we can do this problem in O(K) time with a small tweak — as we briefly mentioned before, each iteration of the Fisher-Yates algorithm will net you one item that is unused. Just stop early after K iterations:

import random
def random_subset_fy( a, N, K ):
    for i in range( N - 1, N - K - 1, -1 ):
        j = int( random.random() * i )
        a[i], a[j] = a[j], a[i]
    return a[ N - K : N ]

(since we update the end of array, we want the last K items)

This is a simple example of an output-sensitive algorithm.


Don’t have the memory to store all the items at the same time? If you know the number of items, then the probability of choosing an element, assuming K more element is needed out of N elements, is simply K/N:

import random
def random_subset_p( a, N, K ):
    result = []

    for i in range( N ):
        # We want to keep this element with probability K/N
        if ( random.random() * ( N - i ) < K ):
            K -= 1
            result.append( a[i] )

    return result

We can prove this, once again, with induction.

However, this still requires us to know how big the set is. To paraphrase Knuth:

Can we pick K random elements from a set of items whose cardinality we do not know?

Yes! Similar to the previous Perl solution, for the N-th item, keep it with K / N probability. If we keep it, replace one of K-th entries at random (with 1/K probability):

import random
def random_subset( iterator, K ):
    result = []
    N = 0

    for item in iterator:
        N += 1
        if len( result ) < K:
            result.append( item )
        else:
            s = int(random.random() * N)
            if s < K:
                result[ s ] = item

    return result

We will prove this with induction.

Base case: For the first K items, you have 100% chance of keeping the current items. For the K+1-th item, this item has a K/(K+1) probability of being chosen. Each previously chosen item has the probability of 1/(K+1) probability of the new item being ignored, plus a K/(K+1) probability that the new item will be chosen, with a (K-1)/K probability that another previously chosen item will be replaced (instead of this one). This is:

Now suppose for the first N items, the algorithm is correct, that is, each previous item had a K/N chance of being kept.

When you consider the item N+1, the probability of it being kept is K/(N+1), which is by definition.

The probability of keeping the previous items was K/N. With (N-K+1)/(N+1) probability that the current item is not kept, plus the probability that the current item is kept (K/(N+1)), but other items were chosen instead ((K-1)/K).

For the N+1-th item, it has K/(N+1) probability of being kept.

This is now still O( N ) time, but with a reasonable O( K ) space. Great for when you can’t just put everything in memory!

There is a difference between this and the Fisher-Yates version — while both return K items, this algorithm does not return the K uniformly in a random order, e.g., the first element is always found in the first position, if it is chosen in the K items. If the order of the K items matter, then simply perform a Fisher-Yates shuffle on the result. (Extra Credit: There is also an O( N log K ) algorithm that preserves this property, without the memory overhead of the Fisher-Yates version, using only O( K ) space.)

And here we go, from start to finish, the best algorithms (and a wrong one) for choosing K distinct elements out of N elements, even if we don’t necessarily know what N is in the beginning!

Algorithms are ubiquitous, and you shouldn’t need a Computer Science degree to learn them. Set up a problem in the right way, and you can solve them trivially, or at least know that you have to live with an approximation. They don’t have to stay under-appreciated until you need them: here are five basic algorithms that you must know, with some real-world applications. Try to learn the underlying structure — you’ll find that they are recurring problems! (Only provided brief overview - if you want to learn more about them, use the links!)


Sorting

This is the most fundamental and essential thing in algorithms, and often the first thing you do to create structure and order out of chaos. This means about the same in English as it does in Math. There are a bunch of crazy sorting algorithms out there: Quick sort is the one that might be the most well-known, while merge sort is useful in a few instances. Introspective sort might be the most important one of the bunch — its fancy name is a cover for using different sorting algorithms depending on different steps of the way, employing the strengths, while sidestepping the worst-case scenarios. It’s a great idea that can apply in other situations — you don’t need to find an algorithm that is the best for everything, use the best parts of each algorithm for each case!

Real World Applications — Sorting is everywhere you want to create structure for quick retrieval. Sort your bookshelf alphabetically so you can find things easier. Sort a deck of cards so you can rig the deck. Sort your friends and enemies.


binarysearch.png

Steps of Binary Search

Binary search

The classic example of Divide and Conquer. Incrementally make your problem a smaller one, until it is trivial. What it exploits is an inherent order, usually created by sorting, that allows you make observation about the answer on every query.

Know the game 20 questions? You get 20 questions to guess a word. Well, since the English Dictionary has about 500,000 words, the simple way to cheat is to cut it half way every time.

It’s boring, but you’ll win every time:

Does your word come before the word “marry”? Yes Does your word come before the word “gerrymander”?

You get the idea.

Binary search is quick: You can go through a dictionary with 2^64 entries in just 64 questions. 2^64 ~ 1.8 * 10^19, or a very big number.

Real World Applications — According to folklore, Professor Skiena would demonstrate binary search by asking for a name, and then looking it up in the phonebook by checking the middle of the book, does a comparison, and literally tear out half the book and throw it in the trash. You can probably come up with a less dramatic example.


Hashing

The legendary Udi Manber said that the three most important algorithms at Yahoo are “hashing, hashing, and hashing”.

Imagine you have a closet full of shirts, and it’s time consuming to find a particular shirt - you might have to go through all of them. What can you do?

Hash it. Take every shirt, one at a time, and put it in a different drawer depending on the color of the shirt. Label the drawer. Whenever you want a shirt of a particular color, just open the drawer with the corresponding color. That’s hashing in a nutshell.

Of course, you might (and likely) have more than one shirts of a given color. This is a hash collision. Dealing with hash collisions in a “perfect” way is mostly in the realm of research papers, but know that they exist.

No one ever got fired for using a hash table before!

This is conceptually easy, and many tend to use it as a black box without giving it much thought. Real hashing is hard, but using a hash table is relatively simple.

Real Life Application — You can pretty much hash anything, if the ordering is not important. You can then quickly retrieve items.

I have a box of Xbox 360 games, and a box with a Wii game. If I want a Wii game, I can easily reach for it in the box. If I want an Xbox 360 game, I can reach for it in that box. However, if I have many Xbox 360 games, and want a particular one, I would need a better method — perhaps sorting it by title within the box, or a few smaller boxes within the box, hashing on the game type: adventure, action, etc.


Dynamic Programming

Also sometimes referred to as memoization. (No, not memorization.) Think of this as caching- save the results so that you don’t have to do the work again! Dynamic Programming refers to finding some structure within the problem to abuse this cache. Be lazy, reuse results!

One of the canonical examples of dynamic programming is the Fibonacci sequence. I will ignore it here, and introduce the “Coin Change” problem instead.

You’re a cashier at a store. A customer pays for something, expecting change. You want to give the exact change back to the customer, while using the fewest number of coins. You can restructure the problem in terms of its subproblems:

What is the fewest number of coins to make change for N cents?

Well, that’s just the fewest number of coins to make change for N - 1 cents plus one coin, a penny or the fewest number of coins to make change for N - 5 cents plus one coin, a nickel or the fewest number of coins to make change for N - 10 cents plus one coin, a dime or the fewest number of coins to make change for N - 25 cents plus one coin, a quarter.

Notice a pattern? You answer a question with a question - a smaller, easier question. Another pattern? You might ask the same question over and over again. For example, you might get to N - 5 by 5 pennies, or a nickel. In fact, this grows rather quickly and is a literal exponential function. How to solve this? Remember the result, and you won’t have to do the work over and over again!

Note that for most modern currencies, the greedy algorithm works for this problem - simply take the largest coin that is smaller than the requested amount, subtract, and repeat.

Real Life Application — Want to know what is the probability of getting X heads out of N coin flips? Well, that’s just the probability of getting X - 1 heads out of N - 1 flips (if I flipped heads) and X heads out of the remaining N - 1 flips (if I flipped tails). Take the weighted (50% for a fair coin) average!


The breadth-first tree obtained when running BFS.

Image via Wikipedia

Search Algorithms

I want to put basic graph theory in general, but search algorithms are great places to start. The fundamental ones: Depth-First Search (DFS) and Breadth-First Search (BFS). Understanding these might lead you to more exotic sounding searching algorithms, like A*, but start with these basics. Many things can be modeled as a graph, and these are simple algorithms to play with by yourself. DFS generally worms its way through paths, while BFS branches out. Each algorithm has its own strengths and weakness, be sure to explore them!

Real Life Application — You need to know if you can get from point A to point B. Start driving from A, and mark off areas you’ve already been. If you reach a dead-end, turn around and come back until you find another road that leads to a place you haven’t been. It might take awhile, but eventually you’ll either run out of roads, or you’ll be at point B!


When it comes to Computer Science, in contrast with Software Engineering, there will always be people who will maintain that it does not come up in the “real world”. It might be true the majority of the time, but you can’t solve problems elegantly if you can’t solve problems at all. Just knowing that certain problems cannot be done in polynomial time frees you from searching or figuring out an exact solution. It’s about expanding your toolbox for whatever work or life might throw at you. They might not come up often, but when they do, you’ll be glad you know where to find the answer!

Larry on things worth learning.

About this Archive

This page is an archive of recent entries in the math category.

algorithms is the previous category.

tools is the next category.

Find recent content on the main index or look in the archives to find all content.

Pages