*Date: Oct 21, 2019*

If you search “rank numbers using numpy”, you will find various resources offering a mysterious solution:

Yes, that is it, and it works. But, why? As I could not find an answer to that, I will try to bring some clarity here.

Before we start, what is **ranking**? For the ascending case, ranking is assigning values to numbers such that each value denotes the index of its corresponding number in a would-be sorted list. For example, we want `rank([1, 2, 0, 3])`

to return `[1, 2, 0, 3]`

. In other words, we want the *order* of each number.

Quoting `numpy.argsort`

documentation, “[argsort] Returns the indices that would sort an array.”. For simplicity, let us assume that our array does not have duplicates.

```
>>> import numpy as np
>>> nums = np.array([3, 5, 0, 10])
>>> nums.argsort()
array([2, 0, 1, 3])
>>> nums.argsort().argsort()
array([1, 2, 0, 3])
```

The final result indeed corresponds to rank of each number.

Let *a**s* := *n**u**m**s*.*a**r**g**s**o**r**t*() and 2*a**s* := *a**s*.*a**r**g**s**o**r**t*().

**1)** Define *i**d**x*(*i*, *a**r**r**a**y*) to be the index of *i* in *a**r**r**a**y* e.g. *i**d**x*(2, [0, 2, 1]) = 1.

**2)** From the definition of `argsort`

, we know that ∀*i*, *j* ∈ *a**s*, *i**d**x*(*i*, *a**s*) > *i**d**x*(*j*, *a**s*) ⇒ *n**u**m**s*[*i*] > *n**u**m**s*[*j*].

∀*i*, *j* ∈ 2*a**s*, *i* > *j* ⇒ 2*a**s*[*i**d**x*(*i*, 2*a**s*)] > 2*a**s*[*i**d**x*(*j*, 2*a**s*)] - follows from (**1**)

⇒ *i**d**x*(*i**d**x*(*i*, 2*a**s*), *a**s*) > *i**d**x*(*i**d**x*(*j*, 2*a**s*), *a**s*) - see the lemma below

⇒ *n**u**m**s*[*i**d**x*(*i*, 2*a**s*)] > *n**u**m**s*[*i**d**x*(*j*, 2*a**s*)] - follows from (**2**)

What this tells us is higher value in 2*a**s* implies higher value for the corresponding number when *n**u**m**s* is indexed into. Combining this with the fact that `argsort`

outputs are a permutation of numbers from 0 to length of the array - 1, we conclude the proof.

This is the crux of the proof. Let’s consider the example above but this time make the indices more explicit.

```
nums -> (0, 3), (1, 5), (2, 0), (3, 10)
as -> (0, 2), (1, 0), (2, 1), (3, 3)
2as -> (0, 1), (1, 2), (2, 0), (3, 3)
```

Notice how the indices and values of *a**s* seem to swap as they map to 2*a**s*. For instance, (0, 2) in *a**s* is now (2, 0) in 2*a**s*, while the (index) value in 2*a**s* is pointing at the original value in *a**s*.

**Prove:** *A*.*a**r**g**s**o**r**t*()[*i*] = *i**d**x*(*i*, *A*) when *A* = *p**e**r**m**u**t**a**t**i**o**n*({0, ..., *l**e**n*(*A*) − 1}). In other words, each value in the argsorted list corresponds to the index of the same value in the original list.

Given *i* ∈ *A*, we know that there are exactly *i* other numbers smaller than *i* in *A*. Then by definition of `argsort`

, `A.argsort()[i]`

holds the index of *i* i.e. *i**d**x*(*i*, *A*).

While this is an interesting way to rank numbers, there are more efficient ways that sort once instead of twice (with additional linear time processing).

Another interesting question is what happens if we continue calling `argsort`

on the result itself. Continuing with our example above, we get:

```
>>> nums.argsort()
array([2, 0, 1, 3])
>>> nums.argsort().argsort()
array([1, 2, 0, 3])
>>> nums.argsort().argsort().argsort()
array([2, 0, 1, 3])
>>> nums.argsort().argsort().argsort().argsort()
array([1, 2, 0, 3])
```

A cycle? Yes, interestingly enough, we have already proven why. Our lemma above proves that as long as the `argsort`

input is a permutation of integers (in the array’s index domain), then the output will be a swap of the values and the indices; combining this with the fact that the prerequisite input property is an invariant under `argsort`

, we get the cyclic property.