Dual spaces, transposes, and adjoints

What’s the dual space of a vector space?

Let $V$ be an $n$-dimensional vector space, with basis
$$ B = \{ v_1, \ldots, v_n \}. $$

Now consider an arbitrary linear function $f$ that takes a vector $v \in V$ and maps it to a scalar in, for example, $\mathbb{R}$. Then express $v$ as a linear combination of the basis vectors:
$$ \begin{align*}
f(v)
&= f(c_1v_1 + \cdots + c_nv_n) \\
&= c_1 f(v_1) + \cdots + c_n f(v_n).
\end{align*} $$

This means that if we specify $f(v_1)$ through $f(v_n)$, we’ve completely characterized $f$. In other words, if you tell me what $f(v_1)$ through $f(v_n)$ equal, I know what $f(v)$ equals for any vector $v$.

So we can specify any linear function that maps from $V$ to $\mathbb{R}$ (called a “linear functional”) as a list of $n$ scalars. This makes it clear that a linear functional is an $n$-dimensional vector itself, and so linear functions form a $n$-dimensional vector space as well:
$$V^* = \{ f_{[a_1, \ldots, a_n]} \mid a_1, \ldots, a_n \in \mathbb{R} \}.$$ This is the dual space of $V$, where I’ve used $f_{[a_1, \ldots, a_n]}$ to denote the linear functional such that $f(v_1) = a_1$, etc.

Since $V$ and its dual space $V^*$ have the same dimension, each element in $V$ has a corresponding element in $V^*$. Consider that
$$
f_{[a_1, \ldots, a_n]} = a_1(c_1v_1) + \cdots + a_n(c_nv_n).
$$Look at the right-hand side — if the vectors we’re dealing with are Euclidean vectors, then we could write $\vecb{a} = (a_1, \ldots, a_n)$ and express any functional as a dot product:
$$
f_{\vecb{a}}(\vecb{v}) = \vecb{a} \cdot \vecb{v}.
$$ We can extend this idea to any inner product space. If $V$ has an inner product, then we can very naturally associate a vector $w \in V$ with its corresponding functional in $V^*$, defined as $$f(v) = \langle v, w \rangle.$$

Now let’s throw linear transformations into the mix. Let $T$ be a linear transformation from $U$ to $W$, and let $f \in W^*$ be a linear functional on $W$.

Then if $u \in U$, observe that $f(T(u))$ gives us a scalar. Since $f$ composed with $T$ maps an element in $U$ to a scalar, the composition $f \circ T$ is a linear functional itself, an element of $U^*$!

In this way, given a functional in $W^*$ and a transformation $T$, we can generate functionals in $U^*$. So we can define another linear transformation from $W^*$ to $U^*$ that gives us this functional: $$ T^\intercal(f) = f \circ T.$$

$T^\intercal$ is called the transpose of $T$, and it’s not a coincidence that the matrix of $T^\intercal$ is the transpose of the matrix of $T$ in certain bases.

Finally, let’s talk about inner product spaces again. If we have inner products for $U$ and $W$, we can, like above, associate vectors $u \in U$ and $w \in W$ with functionals in $U^*$ and $W^*$ respectively:
$$\begin{align*}
u \quad&\longleftrightarrow\quad f_u(v) = \langle v, u \rangle \\
w \quad&\longleftrightarrow\quad f_w(v) = \langle v, w \rangle.
\end{align*}$$

Recall that $T^\intercal$ takes a functional in $W^*$ and returns a functional in $U^*$. It’d be nice if we could deal with vectors in $W$ and $U$ instead of functionals in $W^*$ and $U^*$. Can we replace the transpose with another transformation that takes a vector in $W$, transforms it into its corresponding functional in $W^*$, apply $T^\intercal$ to it to get a functional in $U^*$, and finally transform it back into its corresponding vector in $U$?

It turns out we can: it’s called the adjoint of $T$, denoted $T^*$.

Dual spaces

To see how $T^*$ must be defined, let’s take the definition of the transpose and substitute $f \in W^*$ with its inner product equivalent, $\langle v, w\rangle$, where $w$ is the vector in $W$ associated with $f$, and $v$ is the functional’s argument:
$$\begin{align*}
T^\intercal(f)
&= T^\intercal(\langle v, w\rangle) \\
&= (\langle v, w\rangle) \circ T \\
&= \langle T(v), w\rangle.
\end{align*}$$

Then asking what vector in $U$ corresponds to this functional amounts to asking what vector satisfies
$$ \langle T(v), w\rangle = \langle v, ?\rangle. $$

Thus, $T^*$ is the linear transformation that is defined to be the one that makes the following equality true:
$$ \langle T(v), w\rangle = \langle v, T^*(w)\rangle. $$

Once we do that, we can go on to define important operators like normal, unitary, and Hermitian operators!

March 7, 2014, 1:18pm by Casey
Categories: Math | Tags: | Permalink | Leave a comment

Proof of separation of variables

Let’s say we have a differential equation:
$$f(y) \dod{y}{x} = g(x)$$

Separation of variables says that we can simply “split” the derivative $\dod{y}{x}$ and write
$$f(y) \dif{y} = g(x) \dif{x}$$

Then, we can integrate both sides: $$\begin{gather*}
\int f(y) \dif{y} = \int g(x) \dif{x} \\
F(y) = G(x) + C
\end{gather*}$$ where $F(y)$ and $G(x)$ are antiderivatives of $f(y)$ and $g(x)$ respectively. And voila, we can use algebra to solve the differential equation.

But the “splitting” part always bugged me. Now my professor has explained why this is okay to do. Consider the antiderivative of $f(y)$, and differentiate it with respect to $x$.

We know that
$$\begin{align*}
\dod{}{x}F(y) &= \dod{}{y}F(y) \cdot \dod{y}{x} \text{ by the chain rule} \\
&= f(y) \dod{y}{x}
\end{align*}$$because $F(y)$ is the antiderivative of $f(y)$, so the derivative of $F(y)$ must be the function $f(y)$ itself.

Now, substituting into the differential equation for $f(y) \dod{y}{x}$, we have
$$\dod{}{x}F(y) = g(x)$$

Now, if we integrate the function on the left ($\dod{}{x}F(y)$) and the function on the right ($g(x)$) with respect to $x$, we should get the same antiderivative, up to a constant, so
$$ \int \left[\dod{}{x}F(y)\right] \dif{x} = G(x) + C$$

The antiderivative of a derivative of a function is just the function itself (up to a constant), so
$$ F(y) = G(x) + C$$ which is the result we get from doing it the “intuitive” way.

March 15, 2013, 1:45am by Casey
Categories: Math | Tags: , | Permalink | 2 comments

Why eigenvectors and eigenvalues?

I took linear algebra in high school, and one thing that really confused me was eigenvectors and eigenvalues. Why are they so important? I’m taking linear algebra again, and my professor made it very clear why.

Let’s say we have a linear transformation $T$, and we want to find $T(\vecb{x})$. But what if transforming $\vecb{x}$ is difficult? Then, we’ll say that $T(\vecb{x}) = A\vecb{x}$, where $A$ is the associated matrix, and we’ll find its eigenvectors and eigenvalues, the $\vecb{v}$’s and $\lambda$’s such that $A\vecb{x} = \lambda\vecb{x}$.

If these eigenvectors happen to form a basis, then $\vecb{x}$ can be written as a linear combinations of the eigenvectors, $c_1\vecb{v}_1 + \ldots + c_n\vecb{v}_n$, and then we can plug it back into our transformation.

$$\begin{align}
T(\vecb{x}) &= A\vecb{x} \\
&= A(c_1\vecb{v}_1 + \ldots + c_n\vecb{v}_n) \\
&= c_1A\vecb{v}_1 + \ldots + c_nA\vecb{v}_n \\
&= c_1\lambda_1\vecb{v}_1 + \ldots + c_n\lambda_n\vecb{v}_n \\
\end{align}$$
Depending on the transformation we’re working with, this new calculation could turn out to be a lot easier!

Update: But that’s kind of clunky. Let’s try to express more simply the sum
$$ T(\vecb{x}) = c_1\lambda_1\vecb{v}_1 + \ldots + c_n\lambda_n\vecb{v}_n$$

Matrices were made to deal with large sets of data, so let’s define $P$ to be a matrix of eigenvectors (as column vectors) and $D$ to be a matrix of eigenvalues along the main diagonal (and zeros elsewhere):
$$
P = \begin{bmatrix}
| & & | \\
\vecb{v}_1 & \ldots & \vecb{v}_n \\
| & & | \\
\end{bmatrix} \qquad
D = \begin{bmatrix}
\lambda_1 & \\
& \ddots \\
&& \lambda_n
\end{bmatrix}$$

If we multiply the matrix $P$, the matrix $D$, and the vector $c$, notice that we get
$$
\begin{align*}
PD\vecb{c} &= \begin{bmatrix}
| & & | \\
\vecb{v}_1 & \ldots & \vecb{v}_n \\
| & & | \\
\end{bmatrix}
\begin{bmatrix}
\lambda_1 & \\
& \ddots \\
&& \lambda_n
\end{bmatrix} \begin{bmatrix} c_1 \\ \vdots \\ c_n \end{bmatrix} \\
&= \begin{bmatrix}
| & & | \\
\vecb{v}_1 & \ldots & \vecb{v}_n \\
| & & | \\
\end{bmatrix}\begin{bmatrix} \lambda_1 c_1 \\ \vdots \\ \lambda_n c_n \end{bmatrix} \\
&=c_1\lambda_1\vecb{v}_1 + \ldots + c_n\lambda_n\vecb{v}_n \\
\end{align*}
$$
which is the original sum.

But we still have those $c_k$’s in there. We know that
$$
\begin{align*}
\vecb{x} &= \begin{bmatrix}
| & & | \\
\vecb{v}_1 & \ldots & \vecb{v}_n \\
| & & | \\
\end{bmatrix} \begin{bmatrix} c_1 \\ \vdots \\ c_n \end{bmatrix} \\
&= P\vecb{c}
\end{align*}
$$
so $\vecb{c} = P^{-1}\vecb{x}$ (provided $P$ is invertible).

Cool! Since we know that $T(\vecb{x}) = PD\vecb{c}$, we have our final expression:
$$ T(\vecb{x}) = A\vecb{x} = PDP^{-1} \vecb{x} $$

This is the diagonalization of $A$, where $D$ is the diagonalized matrix and $P$ is the change-of-basis matrix. Why would we do this? Working with diagonal matrices tends to be a lot easier than working with non-diagonal matrices. Again, depending on the transformation, this new calculation could be a lot easier than simply applying the original transformation.

March 5, 2013, 11:08am by Casey
Categories: Meta | Tags: | Permalink | Leave a comment

Beyond forces: work, kinetic energy, and potential energy

I learned about stuff like the work and kinetic energy in high school, but I never really understood why theses concepts were introduced — it feels like they just fell out of the sky. That changed today in physics lecture, and I think it’s so cool to understand where exactly these concepts come from.

So let’s say we’re working with forces in physics. If you look at some common forces, like gravity and the electromagnetic force, we see that forces tend to depend on position:

$$
\begin{gather}
\vecb{F}(\vecb{r}) = \frac{GMm}{r^2} \hatb{r} \\
\vecb{F}(\vecb{r}) = \frac{kq_1q_2}{r^2} \hatb{r}
\end{gather}
$$

But force is defined using a time derivative of velocity:
$$ \vecb{F}(t) = m\vecb{a} = m\dod{\mathbf{v}}{t} $$

This is kind of inconvenient. The definition of force depends on time, but most forces are expressed with respect to position. Meanwhile, the positions of particles tend to vary with time (i.e., they move), making things more complicated. In order to work with moving particles experiencing forces, we would need to convert all of our times into positions or vice-versa.

It turns out that there’s an easier way. Someone clever decided to integrate both sides of Newton’s second law along the position to get these line integrals along a path $C$:
$$\begin{align}
\int_C \vecb{F} \cdot \dif{\vecb{r}} &= m \int_C \dod{\vecb{v}}{t} \cdot \dif{\vecb{r}} \\
&= m \int_{t_1}^{t_2} \dod{\vecb{v}}{t} \cdot \vecb{v}\dif{t} \\
&= m \int_C \vecb{v} \cdot \dif{\vecb{v}} \\
&= m \left( \int_{v_{x1}}^{v_{x2}} v_x \dif{v_x} + \int_{v_{y1}}^{v_{y2}} v_y \dif{v_y} + \int_{v_{z1}}^{v_{z2}} v_z \dif{v_z} \right) \\
&= \frac{1}{2}m \left( \left(v_{x2}^2 – v_{x1}^2\right) + \left(v_{y2}^2 – v_{y1}^2\right) + \left(v_{z2}^2 – v_{z1}^2\right) \right) \\
&= \frac{1}{2}mv_2^2 – \frac{1}{2}mv_1^2 \\
\end{align}$$

And we’re left with the Work-Energy Theorem:
$$\int_C \vecb{F} \cdot \dif{\vecb{r}} = \frac{1}{2}mv_2^2 – \frac{1}{2}mv_1^2$$

Now, if we define the quantity $\frac{1}{2}mv^2$ to be the “kinetic energy” and the quantity $\int \vecb{F} \cdot \dif{\vecb{r}}$ to be the “work” (the “total” force applied to a system over a distance), then we can express the theorem intuitively: work applied to a system increases the kinetic energy. And now we can use it to solve problems without having to deal with times — only distances.

Let’s try a quick example. If gravity applies a force of 10 N to a box from rest for 10 m, what will its velocity be? In this case, $F = 10 \text{ N}$, $r_2 = 10 \text{ m}$, and $r_1 = 0 \text{ m}$. We can now plug in (we can replace the line integral with a regular integral, assuming we’re moving in a straight line):
$$\begin{gather}
\int_{r_1}^{r_2} F \dif{r} = \frac{1}{2}mv_2^2 – \frac{1}{2}mv_1^2 \\
F(r_2 – r_1) = \frac{1}{2}mv_2^2 – \frac{1}{2}mv_1^2 \\
\end{gather}$$
And from there, we can solve for velocity. Again, notice that we don’t care about the time it takes, just the distance over which a force acts. Cool! (What if we cared about the time it takes rather than the distance? Then we can use impulse, which comes from the definition of force.)

There’s one problem with the Work-Energy Theorem: evaluating the line integral is annoying. Converting that line integral into a regular integral is fine for the example I just gave, but it won’t work if the path is not in the same direction as the force.

It turns out, though, that a lot of the forces we care about (gravity, springs, electromagnetic force) are conservative forces. This means that the line integral is path-independent: its value doesn’t depend on which path you take from point A to point B; it only matters where point A and point B are.

One analogy is with regular integrals. If you’re integrating from $x = 0$ to $x = 5$, you’ll always get the same answer as long as the ultimate endpoints are the same.
$$ \int_0^5 f(x) \dif{x} = \int_0^{-10} f(x) \dif{x} + \int_{-10}^7 f(x) \dif{x} + \int_7^5 f(x) \dif{x}$$
Along the same lines, for line integrals of conservative forces, you’ll always get the same answer no matter which path you take.

The path-independence of regular integrals is what makes the fundamental theorem of calculus possible:
$$ \int_a^b f(x) \dif{x} = F(b) – F(a)$$

It turns out that there’s an analogous version of the fundamental theorem for line integrals of conservative forces:
$$ \int_C \vecb{F}(\vecb{r}) \cdot \dif{\vecb{r}} = U(\vecb{r}_2) – U(\vecb{r}_1)$$ where $U$ is the “antiderivative” — the scalar potential of the vector field $\vecb{F}$. In other words, for every conservative force, we can define a potential function that we can use to evaluate line integrals of conservative forces. This potential function will have a value for every point in space. (Note that the value will be a scalar and not a vector.) For example, for the force of gravity, $\vecb{F}(\vecb{r}) = \frac{GMm}{r^2}\hatb{r}$, and $U(\vecb{r}) = -\frac{GMm}{r} + C$, where $C$ is the constant of integration, here usually taken so that $U(\infty) = 0$. Notice that $\dod{U}{r} = F$ (or the multidimensional equivalent, $\nabla{U} = \vecb{F})$!

So for a conservative force, the Work-Energy Theorem becomes:
$$U(\vecb{r}_2) – U(\vecb{r}_1) = \frac{1}{2}mv_2^2 – \frac{1}{2}mv_1^2$$ or
$$\frac{1}{2}mv_1^2 + U(\vecb{r}_1) = \frac{1}{2}mv_2^2 + U(\vecb{r}_2)$$
Let’s call $U$ the potential energy, and now we’ve obtained an expression for the conservation of mechanical energy. As long as we’re talking about conservative forces like gravity, the sum of the kinetic and potential energy stay the same no matter what you do.

So that’s how you get work, kinetic energy, and potential energy from forces. I hope they seem more connected now and less like a jumble of concepts that physics teachers throw at you! Leave a comment if it was helpful!

February 26, 2013, 5:01pm by Casey
Categories: Physics | Tags: | Permalink | Leave a comment

Equilibrium simulator

I’m taking a chemistry class right now on acid-base equilibria, and one problem that keeps getting assigned is one where you’re given some initial concentrations, and they want you to find the equilibrium concentrations.

For example, here’s one: Consider 1.00 L of a 0.082 M solution of aqueous formic acid (HCO2H), where $K_a= 1.78 \times 10^{-4}$. What are the equilibrium concentrations of HCO2H, HCO-, H3O+, and OH-?

There are two relevant reactions (we know their equilibrium constants):
equations

We can set up an ICE table with the initial, change in, and equilibrium concentrations ([X] denotes concentration of chemical X):

[HCO2H] [HCO-] [H3O+] [OH-]
initial $$0.082$$ $$10^{-7}$$ $$0$$ $$10^{-7}$$
change $$-x$$ $$x + y$$ $$x$$ $$y$$
equilibrium $$0.082-x$$ $$10^{-7}+x + y$$ $$x$$ $$10^{-7}+y$$

And then we can plug these final concentrations into our two equilibrium constant formulas:

ka
kw

We can solve this system by either making some approximations or shoving it all into something like Mathematica. If we do that, we get [HCO2H] = 0.0783 M, [HCO-] = 0.00373 M, [H3O+] = 0.00373 M, and [OH-] = 2.68 × 10-12 M.

But that’s annoying. Being a programmer, I thought it would be a good idea to try and write a program that numerically solves these problems. It was a fun distraction from my actual homework! It works by pushing each reaction in the right direction until its reaction quotient becomes its equilibrium constant.The below is the result (Chrome is best — Firefox doesn’t support the HTML5 <input type="range"> element!):

Equilibrium simulator Reactions
    Concentrations

    Click here if you’re interested in the code.

    February 13, 2013, 8:49pm by Casey
    Categories: Programming | Tags: , , | Permalink | Leave a comment

    ← Older posts