In psychophysics, we attempt to characterize the visual system by means of input-output relations. We present an image to the observer (input) and we study how the observer's response (output) changes as we change properties of the image. Linear systems are about the simplest input-output systems and although they are overly simplistic as a model for the entire visual system, they often provide useful descriptions of behaviour in very restricted tasks and with very restricted stimulus classes. Apart from that, many important experimental paradigms are derived from and inspired by methods that characterize linear systems.

This blog post will present the basics of linear systems analysis in the context of psychophysics. The characterizations in this post are — although theoretically useful — of limited practical applicability. Follow up posts will present methods that are more applicable in specific experiments.

Simplifying assumptions

Formally, we can think of an input-output system as a function \(f\) that takes a collection of inputs \(x\) together with some external variables \(t\) (such as time and location) and maps them to a collection of outputs \(y = f(x, t)\). In a strict sense, \(x\) should be all the inputs that the system will ever encounter and \(y\) would then be all the outputs that the system will ever generate from these inputs. This allows for all kinds of interactions between inputs at different times and locations, that would allow the system to model behaviours such as learning. This is obviously quite complicated to study which is why we will here restrict our attention to time invariant systems, that do not change over time. For such systems, we can write \(y = f(x)\). Note however, that \(x\) and \(y\) are still vectors. For example \(x\) could be a vector of the intensities of each pixel in an image. This allows for the fact that the system combines different pixel locations to derive the output of a single output location. However, by having the system be independent of \(t\), the reference to other pixel locations can only be relative. In other words, the system could combine \(x_i\) with \(x_{i+1}\) for arbitrary \(i\), but it could not combine \(x_i\) and \(x_t\) for a fixed \(t\) that's independent of \(x\) itself.

The output form the linear system would also be a vector. We could for example think of it as a vector of neural responses to that image. In psychophysics, we often have responses that are button presses; The observer either pressed button 1 or button 2. We can usually derive such a button press response from the vector \(y\) using methods from {tag}signal detection theory. However, that will be a topic for another post.

What does it mean if a system is linear?

When I ask people what they think makes a function linear, they often say something about a line. Although that's not entirely true, it really doesn't capture the essential part. A function, such as \(f\) here, is linear if it obeys what is called the principle of superposition. Let \(x_1\) and \(x_2\) be two possible inputs and \(a\) and \(b\) two numbers, then \(f\) is linear if

$$ f(ax_1 + bx_2) = af(x_1) + bf(x_2). $$

In other words, the response of a weighted sum of inputs is a weighted sum of the corresponding outputs.

In order to really appreciate this point, let's rewrite the linearity definition in two statements. Then \(f\) is linear if

  1. \(f(ax) = af(x)\), and
  2. \(f(x_1 + x_2) = f(x_1) + f(x_2)\).

Can you prove that these two statements together are equivalent to the definition above?

One implication of this is that we can characterize a linear system with pretty sparse measurements. Once we know the response to \(x\), we know the response to any scaled version \(ax\) of \(x\). Furthermore, if we know the responses to two inputs \(x_1\) and \(x_2\), we know the responses to their sum \(x_1 + x_2\) through (2) and we even know the responses to any weighted sum \(ax_1 + bx_2\).

The impulse response function as a universal description

Linearity becomes particularly powerful in combination with a set of test inputs \(x_t, t=1,...,N\) such that their weighted sums span all potentially relevant inputs. One combination of these test inputs are the impulse functions

$$ \delta_t(i) = \begin{cases} 1, \mathrm{if}\; t=i\\ 0, \mathrm{otherwise.} \end{cases} $$

Note that the index \(t\) goes over both time and/or space. Thus, if we have a system that ignores the time dimension, \(t\) would be space and the \(\delta_t\) would be images that are zero everywhere except for the \(t\)-th pixel. We can write every possible image \(a\) as a sum of these single pixel images \(\delta_t\) by just multiplying each \(\delta_t\) with the image's intensity at location \(t\), i.e.

$$ a = \sum_{t=1}^N a(t) \delta_t, $$

Note that we use different notations here: \(a\) is a whole image and \(a(t)\) is the number that represents the intensity of that image at location \(t\). On the other hand \(\delta_t\) is a whole image and \(\delta_t(t)\) would be the number one, while \(\delta_t(s)\) would be the number 0 if \(t\neq s\). Thus, it is sufficient to know the responses of a linear system to the \(\delta_t\) to know the responses of a linear system to every possible image.

For one-dimensional input signals \(x\) and a time-invariant linear system, we further find that

$$ f(\sum_{t=1}^N a(t) \delta_t)(i) = \sum_{t=1}^N a(t) f(\delta_t)(i). $$

Note that the \(\delta_t\) are simply shifted versions of \(\delta_0\), i.e.

$$ \delta_t(s) = \delta_0(s-t). $$

We will therefore write \(\delta:=\delta_0\) and because \(f\) is time-invariant, the response \(f(\delta_t)\) will simply be a shifted version of \(f(\delta)\). Let's call \(f(\delta)=:h\) the impulse response of the linear system \(f\), then we write

$$ f(\sum_{t=1}^N a(t) \delta_t)(i) = \sum_{t=1}^N a(t) h(i-t) = (a*h)(i) $$

The operation \(*\) is called convolution. Thus, we found that the response of a linear system is given by the convolution of the input with the system's impulse response. This statement applies for higher dimensional inputs as well, although the convolution is a little more complicated.

The important point here is, that we can fully characterize a time-invariant linear system if we know its impulse response. However, the impulse response is fairly easy to measure compared to the response to any arbitrary stimulus.

For example, the point spread function of the eye, which describes how a point stimulus at position 0 is spread out by the optics of the eye is a measured impulse response. We can therefore model the optical processes of the eye by convolution of the input image with the point spread function.