Compressive sensing

Index


The problem

The idea of compressive sensing stems from the following observation: in many applicative contexts we first acquire a large amount of data which are immediatly compressed. Consider, for example, the case of the following video surveillance scene
First frame Second frame Difference
It is clear that only a very small fraction of the pixels are significantly different between the two images (around 1-2%, to be precise). This means that we are acquiring 100 times more data then the strictly necessary.

Another example is given by Dual-Tone Multi-Frequency (DTMF) signals (generated by telephone keyboards). DTMF, as well known, are the sum two sinusoids, identifying the row and the column of the pressed key. Since the highest frequency in DTMF is 1633 Hz, one must acquire at least 3266 samples per seconds in order to avoid aliasing. However, supposing a person could dial at most two digits per seconds, one could describe a DTMF by using only 4 values per seconds: the row index and the column index for every half seconds. The usual approach of first sampling, then compressing requires to acquire 800 times more the data then the minimum necessary.

The question should be obvious: can we acquire directly only the necessary data?

The solution

In the compressive sensing context the signal to be acquired is represented by a K-dimensional vector x ∈ RK which, by hypothesis, can be expressed as a linear combination of N < K vectors of a set B= {b1, ..., bM} with M ≥ K vectors. Typically M >> N .
The two examples given above can be modeled as follows
  • In the video surveillance example, M is equal to the number of pixels of the image, each vector bk is a image with a pixel equal to 1 and every thing else equal to zero and N = 0.02 M.
  • In the DTMF example, we have M=8, each bk is one of the 8 sinusoids used in DTMF and N=2
It is worth emphasizing that we know that at most N of the bk vectors will be used, but we do not know which ones.

Our goal is to be able to acquire x by doing (ideally) only N measures. What is a "measure"? In the context of compressive sensing a measurement act is usually modelled as the computation of the scalar product between x and "sensor vector" am ⟨ x, am ⟩ = ∑n xn am,n Note that if x represents any physical phenomenon we are interested in, every linear measurement done on x can be represented as a scalar product between x and a suitable "sensor vector".

It is immediate to check that if we knew which vectors bk are used in the linear combination for x, it would suffice to choose N linearly independent vectors am and the problem of computing x from the N measures above it would be a simple linear algebra problem. However, we do not know which vectors enter the linear combination and it is not obvious if the problem can be solved.

The surprising answer is that this problem can actually be solved, although we pay our loss of knowledge about which vectors must be selected, with the need of doing a number of measurements which is slightly larger than N. Although this result could seem surprising, it has a simple explanation which is worth seeing.

Let us consider the toy example where x is a three-dimensional vector (i.e., K=3) and suppose we know that x belongs to one of the coordinate axis. This is an example of the compressive sensing problem with N=1 and three vectors b1, b2, b3, corresponding to the three coordinate axes.

Suppose we first carry out a measure with a vector a1 = [a1,1, a1,3, a1,3] and obtain value v1 = x1 a1,1 + x3 a1,2 + x3 a1,3 As well known, the equation above is the (implicit) equation of a plane orthogonal to a1. Since we know that x belongs to one of the coordinate axes, we can deduce that x must be one of the three intersection of the plane above with the coordinate axes. It is clear that a single measure does not allows us to deduce x.

However, suppose we take a second measurement with a vector a2 = [a2,1, a2,3, a2,3] and obtain value v2 = x1 a2,1 + x3 a2,2 + x3 a2,3 Since we know that x belongs both to the plane associated with v1 and to the plane associated with v2, we can deduce that x must belong to the line resulting by the intersection of such planes. Such a line, unless a1 and a2 are chosen in a very special way, will intersect only one axis, allowing us to determine x.

This toy example can be easily generalized to larger spaces and also to "noisy" settings where x is modeled as the sum of a linear combination of vectors bk and a "noise vector" e whose components are "small". However, the description of such a more general settings is outside the scope of these introductory notes.

Applications

Compressive sensing techniques are interesting in those contexts where the signal of interest has a number of degrees of freedom much smaller than the number of samples one should take with more traditional approaches and low-complexity solution are sought. The example of the surveillance video is a standard one.

An interesting property of compressive sensing techniques is that the vectors am can be chosen in any way, even randomly, as long as they are linearly independent and known at the receiver. This can be exploited in order to construct a scheme which computes the scalar product ⟨ x, am ⟩ with a small amount of required memory. The figure shows how this can be done: the samples of x are multiplied with a pseudo-random sequence (which represents the vector am) and the resulting products are accumulated by using a first-order IIR filter. Of course, the receiver needs to know the initial state of the pseudo-random generator. Note that this scheme requires only the memory necessary to store the accumulator value and (possibly) the internal state of the pseudo-random generator. This memory requirements should be compared with, for example, the memory requirement of a standard video coder based on motion compensation which needs enough memory to store at least two video frames.