Dual Number and Auto Derivative of Special Orthogonal Group

Posted on 2022-03-20 Edited on 2022-06-14 Disqus:

Background

Finding the closed-form Jacobian matrix of a residual function is one of the most time consuming works in solving a nonlinear optimization problem. A work around is to use numerical Jacobian. i.e. manually give a small increment in every variable and find out the increment in residual. However the selection of the increment depends on the data magnitude. An inappropriate increment can lead to fluctuation before convergence.

In google ceres-solver, an auto derivative method is introduced to liberate developers from complicated Jacobian debugging. This blog introduces the principle behind the auto derivative and explains a Perspective-N-Point problem demo based on auto derivative and nonlinear optimization.

Dual Number Algebra

The basic idea of auto derivative is to extend the input of a single variable function. Consider function \(y = f(x)\), the input \(x\) is no longer a variable, but an out put of another function with first order slop \( b\varepsilon \). Where \(\varepsilon\) is the basis of the increment, real number \(b\) measures the magnitude of the increment. Therefore, variable \(x\) is expressed as \(x = a + b \varepsilon\).

Define \(\varepsilon ^2 = 0\) to drop infinitesimals with order higher than one.

Addition

\[ (a + b \varepsilon) + (c + d \varepsilon) = (a + c) + (b + d) \varepsilon \]

Identity of Addition:

\[ (a + b \varepsilon) + (0 + 0 \varepsilon) = a + b \varepsilon \]

For the C++ implementation of dual number addition, see mxm/linalg_dual_number.h:93.

Multiplication

\[ (a + b \varepsilon)(c + d \varepsilon) = ac + (ad + bc) \varepsilon \]

Conjugate

Denote \(x = a + b \varepsilon\), its conjugate \(\bar x\) is denote as:

\[ \bar{x} = a - b \varepsilon \]

Then, \(\bar{x} x = a^2 - b^2 \varepsilon^2 = a^2\).

Identity of Multiplication:

\[ (a + b \varepsilon)(1 + 0 \varepsilon) = a + b \varepsilon \]

Inversion of Multiplication:

\[ \begin{align} \frac{1}{a + b \varepsilon} &= \frac{a - b \varepsilon}{(a + b \varepsilon) (a - b \varepsilon)}\\ &= \frac{a - b \varepsilon}{a^2} \\ \end{align} \]

Where \(a \neq 0\). Dual number division uses conjugate property to eliminate the denominator.

For the C++ implementation of dual number multiplication, see mxm/linalg_dual_number.h:22.

Derivative

For all functions that can be expanded as Taylor Series at point \(x_0\) :

\[ \begin{align} f(x) &= f(x_0) + \frac{f^ {\prime} (x_0)}{1!}(x-x_0) + \frac{f^ {\prime \prime} (x_0)}{2!} (x-x_0)^2 + \frac{f^ {\prime \prime \prime} (x_0)}{3!} (x-x_0)^3 + \cdots \\ &= \sum_{\infty }^{n=0} \frac{f^{(n)}(x_0)}{n!} (x-x_0)^n \\ \end{align} \]

Set \(x = a + b \varepsilon\) and expand the series at \(x_0 = a\),

\[ \begin{align} f(a + b \varepsilon) &= f(a) + \frac{f^ {\prime} (a)}{1!} (b\varepsilon) + \frac{f^ {\prime \prime} (a)}{2!} (b\varepsilon) ^2 + \frac{f^ {\prime \prime \prime} (a)}{3!} (b\varepsilon) ^3 + \cdots \\ &= f(a) + f^ {\prime} (a) (b\varepsilon) \tag{2-5}\\ \end{align} \]

i.e.

\[ f^{\prime }(a) \varepsilon = \frac{ f(a + b \varepsilon) - f(a) }{b} \tag{2-6} \]

Dual Number Elementary Functions

In order to use dual number in C++ project, simply overloading operators like + - * / is not enough. C++ built in functions such as sin, cos, exp, log don't handle dual number input. The dual number class have to support all these functions.

Luckily, most elementary functions of dual numbers can be converted to the combination of C++ built in functions. For example:

By applying formula: \(\sin(u + v) = \sin(u) \cos(v) + \cos(u) \sin(v)\) and \(\lim_{x \to 0} \frac{\sin(x)}{x} = 1 \)

\[ \begin{align} \sin(a + b \varepsilon) &= \sin(a) \cos(b \varepsilon) + \cos(a) \sin(b \varepsilon) \\ &= \sin(a) \cdot 1 + \cos(a) b \varepsilon \\ &= \sin(a) + b \cos(a) \varepsilon \end{align} \]

Or by applying formula (2-5): \(f(a + b \varepsilon) = f(a) + f'(a) (b \varepsilon)\)

\[ \sin(a + b \varepsilon) = \sin(a) + b \cos(a) \varepsilon \tag{3-1} \]

Thus the C++ implementation of the sin function for dual number is:

    template <typename DType> DualNumber <DType>
    sin(const DualNumber < DType>& val)
    {
        return {std::sin(val(0)), std::cos(val(0)) * val(1)};
    }

Mostly Used Elementary Functions

\[ \begin{align} \sin(a + b \varepsilon) &= \sin(a) + b \cos(a) \varepsilon \\ \arcsin(a + b \varepsilon) &= \arcsin(a) + \frac{b \varepsilon}{\sqrt{1-a^2}} \\ \cos(a + b \varepsilon) &= \cos(a) - b \sin(a) \varepsilon \\ \arccos(a + b \varepsilon) &= \arccos(a) - \frac{b \varepsilon}{\sqrt{1-a^2}} \\ \exp(a + b \varepsilon) &= \exp(a) + b \exp(a) \varepsilon \\ \ln(a + b \varepsilon) &= \ln(a) + \frac{a}{b} \varepsilon \\ (a + b \varepsilon)^ \alpha &= a^ \alpha + (\alpha b) a^{\alpha - 1} \varepsilon \\ \end{align} \]

For the C++ implementation of the elementary functions above, see mxm/linalg_dual_number.h:138.

Derivative of 3D Special Orthogonal Group

Consider a rotation axis \(\phi \in \mathbb{R}^{3} \) and \(\| \phi \| = 1 \), a rotation along axis \(\phi\) with angle \(\theta\) can be expressed as \(f(\theta) = \exp (\theta \phi^ \wedge) \). The \((\cdot )^ \wedge \) operation converts a 3D vector into a skew-symmetric matrix.

Meanwhile, \(f'(\theta) = \phi^ \wedge \exp(\theta \phi^ \wedge)\) and \(f^{(n)}(\theta) = (\phi^ \wedge)^n \exp(\theta \phi^ \wedge)\)

The Taylor Series of \(f(\theta)\) at point \(\theta_0\) is:

\[ \begin{align} f(\theta) &= f(\theta_0) + \frac{f'(\theta_0)}{1!}(\theta - \theta_0) + \frac{f''(\theta_0)}{2!}(\theta - \theta_0)^2 + \cdots \\ &= ( I + \phi^ \wedge (\theta - \theta_0) + \frac{(\phi^ \wedge)^2}{2!}(\theta - \theta_0)^2 + \cdots ) f(\theta_0)\\ &= \left( \sum_{n=0}^{\infty } \frac{(\phi^ \wedge)^n}{n!} (\theta - \theta_0)^n \right) f(\theta_0) \end{align} \]

Let \(\theta = a + b \varepsilon\) and \(\theta_0 = a\), then

\[ f(a + b \varepsilon) = (I + \phi^ \wedge b \varepsilon) f(a) \] \[ \frac{f(a + b \varepsilon) - f(a)}{b} = \phi^ \wedge f(a) \varepsilon \]

The DoF of a 3D rotation is 3, therefore three orthogonal rotation vectors are required to find all derivative of a rotation function. Normally the standard orthogonal basis \([0,0,1]^\top, [0,1,0]^\top, [1,0,0]^\top\) are used as \(\phi\) .

Rodrigues Formula

Rodrigues formula is essential the exponential map from \(\mathfrak{so}(3)\) to \(SO(3)\).

Denote \(\varphi = \theta \phi\)

\[ \exp(\varphi) = I + \frac{\sin(\theta)}{\theta} \varphi^ \wedge + \frac{1-\cos (\theta)}{\theta^2} \varphi^ \wedge \varphi^ \wedge \]

The numerical accuracy decrease when \(\theta \rightarrow 0\). In normal C++ implementation, an if expression is used to eliminate the singularity:

\[ \| \varphi \| < \epsilon \Rightarrow \exp(\varphi) = I \tag{4-3} \]

Where \(\epsilon\) is the machine accuracy. However the expression is incorrect when dealing with dual number input.

Denote

\[ \varphi = \left[ \begin{matrix} a_1 + b_1 \varepsilon \\ a_2 + b_2 \varepsilon \\ a_3 + b_3 \varepsilon \\ \end{matrix} \right] \]

Denote \(a = [a_1, a_2, a_3]^\top\). \(\theta = 0 \Rightarrow \|a\| = 0 \Leftrightarrow \varphi ^ \wedge \varphi ^ \wedge = 0 \) but \( \not \Rightarrow \varphi ^ \wedge = 0\). Therefore, the expression (4-3) should be corrected to

\[ \| \varphi \| < \epsilon \Rightarrow \exp(\varphi) = I + \varphi^ \wedge \tag{4-4} \]

Here is the C++ implementation of the Rodrigues formula that supports dual number calculation.

A Demo with Rotation Auto Derivative

Perspective N Point(PnP) problem tends to find the camera pose by minimizing the reprojection error between 2D points. It is a typical nonlinear optimization problem with the state vector constrained on a 6 DoF manifold. The demo use dual number auto derivative to find the Jacobian matrix and the method converged correctly.

See mxm/cv_3d.h:317 for Jacobian matrix calculation and tests/test_cv_basic.cpp:537 for PnP problem verification pipeline.

Limitations

Consider real number expression \(c = \frac{ab}{b}\). It is easy to find \(c = a\) when \(b \neq 0\) but hard to define the value of \(c\) when \(b = 0\). Similarly for dual number: \(x \neq \sqrt[]{x^2}\) when \(x\) is a purely nonreal dual number, i.e. \(x = b \varepsilon\).

References

Michael Penn, "The strange cousin of the complex numbers - the dual numbers." YouTube, 2022
Wikipedia, "Dual number."

Numerical Inversion of Radial Tangential Distortion Model

Posted on 2022-01-21 Edited on 2022-03-22 Disqus:

Background

Radial tangential distortion model is the most commonly used model in pinhole camera calibration. Its mathematical representation is shown as formula (1-1). By observing the formula, it is fair to say that the coordinate mapping from undistorted to distorted is natural. But the reverse of the process is difficult because the inverse function is in implicit form.

\[ \left[ \begin{matrix} x_d \\ y_d \\ \end{matrix} \right] = \left[ \begin{matrix} x \\ y \\ \end{matrix} \right]+ \underbrace{ \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} \left[ \begin{matrix} x \\ y \\ \end{matrix} \right] }_{\text{radial}} + \underbrace{ \left[ \begin{matrix} 2xy & r^2 + 2x^2 \\ r^2 + 2 y^2 & 2xy \\ \end{matrix} \right] \left[ \begin{matrix} p_1 \\ p_2 \\ \end{matrix} \right] }_{\text{tangential}} \tag{1-1} \]

Here \(r^2 = x^2 + y^2\)

In engineering application, both directions of the mapping are used. Therefore, the implementation in OpenCV may have some indications. Here is a description that copied from OpenCV specification:

where undistort is an approximate iterative algorithm that estimates the normalized original point coordinates out of the normalized distorted point coordinates ("normalized" means that the coordinates do not depend on the camera matrix) ^[1].

An approximate iterative algorithm proves the difficulty to find the close form of the undistort mapping. The implementation ^[2] in OpenCV seems to call the distortion function several times and finally get the solution of undistort mapping. The workflow looks like a Newton Method but without any derivative calculation.

Mathematical Representation

Given vector-valued function \(y = f(x)\) and vector \(y_0\), find vector \(x_0\) that satisfies \(f(x_0) = y_0\) .

The solution would be quite clear if \(x = f^{-1}(y)\) is an explicit function. However, for the radial tangential model that mentioned above, it is an implicit function. Therefore, an numerical method is required to find the solution.

\[ x_{n+1} = x_{n} + (x_0 - f(x_{n})) \]

Denote \(g(x) = x + x_0 - f(x)\),

\[ x_{n+1} = g(x_n) \]

A Controller's Approach

Recall the the distort function moves a pixel a bit in a specific direction. So the undistort function is to find the original coordinate of a distorted point. The approach used in this blog is to add the error back to the original input and finally find the numerical solution. It sounds like an incremental PID controller.

Notations

\(P_d\): distorted point coordinate
\(P_n\): point coordinate in n-th iteration
\(K_I\): coefficient of the integral feedback
\(f(x)\): distort function
\(e_n\): error in n-th iteration, \(e_n = P_d - f(P_n)\)

Iterative Formula

Fig.1 - Controller Diagram of the Integral Controller

\[ P_{n+1} = P_n + K_I e_n \] \[ P_{n+1} = P_n + K_I(P_d - f(P_n)) \tag{1-2} \]

Set \(K_I = 1.0\) to simplify the problem. here is the C++ implementation in mxm/model_camera.h:

    virtual Matrix undistort(const Matrix& homo_pts) const override
    {
        Matrix ret(homo_pts);
        const size_t iter_max = 5;
        Matrix tmp;
        for(size_t i = 0; i <iter_max; i++)
        {
            tmp = distort(ret);
            ret = (homo_pts - tmp) + ret;
        }
        return ret;
    }

If the controller works, then \( \lim_{n \to \infty} e_n = 0\), i.e. \( \lim_{n \to \infty} P_d = f(P_n) \).

Test Result

The undistort accuracy turned out to be good.

Fig. 2-1: undistorted image

Fig. 2-2: distorted

Fig. 2-3: original image

Fig. 2-4: diff between original image and distorted image

Know-How and Know-Why

We have found out how the iteration method works, but still don't know why. There are several questions left to be answered.

What's the condition for convergence of the iteration method?
What's the convergence rate?
What can we do if the iteration diverged?

With a great amount of searching on the Internet I have found the standard name of the iteration method used by OpenCV is called: Fixed-Point Iteration.

Fixed-point Iteration

In general, an iterative system has the form

\[ u_{n+1} = g(u_n) \]

Where \(g: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n} \) is a vector-valued function.

A fixed point or equilibrium of a discrete dynamical system is a vector \(u^{\star} \in \mathbb{R}^{n}\) such that^[3]

\[ g(u^\star ) = u^ \star \]

Convergence rate of the iteration: \(\rho(g'(x))\). The method is convergent only if \(\rho(g'(x)) < 1\). If \(\rho(g'(x)) > 1\), the procedure diverges. If \(\rho(g'(x)) < 1\) but is close to \(1.0\), convergence is quite slow ^[4]. Here \(\rho(\cdot)\) is the spectral radial of a matrix. \(g'(x)\) is the jacobian matrix of vector-valued function \(g(x)\).

References

OpenCV, undistortPoints
OpenCV, undistort implementation
Peter J. Olver, Numerical Solution of Scalar Equations, Numerical Analysis Lecture Notes
Peter J. Olver, Vector-Valued Iteration, Numerical Analysis Lecture Notes
Joe D. Hoffman, Fixed-Point Iteration Numerical Methods for Engineers and Scientists

Ray Tracing on 4-Dimensional Scene

Posted on 2021-03-01 Edited on 2022-06-09 Disqus:

N-1 Dimensional Objects in N Dimensional World

In this chapter we are going to explore the perspective of a 4D creature when he looks down to our 3D world.

N Dimensional Objects have 0 thickness in the (N+1)th dimension. When calculating the intersection between N+1 Ray and N-Simplex, then equation is under constrained.

One solution is to remesh to geometry with N+1 Simplex. For example a 3D ray is hard to cannot intersect with edges of a rectangle. But after triangulate the rectangle to two triangles, the intersection will be possible. Likewise, to intersect a 3D triangle mesh with a 4D ray, tetrahedralization is required.

Estimate the Number of Simplices

Surface of a regular triangle with edge length \(a\): \(\frac{\sqrt{3}}{4}a^2\)
Volume of a regular tetrahedron^[1] with edge length \(a\): \(\frac{\sqrt{2}}{12}a^3\)

For a sphere with radial \(r\), one need \(N_2\) triangle to triangulate the surface.

\[ \begin{align} N_2 &= \frac{S_{sphere}}{S_{triangle}} \\ &= \frac{4 \pi r^2}{ \frac{\sqrt{3}}{4}a^2 } \\ & \leq O \left( \left( \frac{r}{a} \right)^2 \right) \end{align} \]

But \(N_3\) tetrahedrons to tetrahedralizate the volume:

\[ \begin{align} N_3 &= \frac{V_{sphere}}{V_{tetrahedron}} \\ &= \frac{ \frac{3}{4} \pi r^3}{ \frac{\sqrt{2}}{12}a^3 } \\ & \leq O \left( \left( \frac{r}{a} \right)^3 \right) \end{align} \]

The estimation gives an upper bound because the sphere is the geometry with smallest surface area-volum-ratio. But it still shows the computational complexity which changes from \(O(n^2)\) to \(O(n^3)\) when using a tretraheron to mesh the geometry instead of triangle.

Prism: A Better Solution

Convex Geometry in N-1 D Subspace

As shown in the figure above, we decompose the N-Ray to two rays. A (N-1) D ray for the subspace and a 1-D ray for axis direction of the prism. When the geometry in (N-1) D subspace is convex, each ray produces a time interval in intersection denote as \(T_s, T_h\). The actual intersection time interval can be calculated as \(T = T_s \cap T_h\).

General Geometry in N-1 D Subspace

For concave geometry case, the intersection of N-1 D subspace generates more than one time interval.

Dimensional Escalation

Now consider a 4D scene. A 3D geometry is viewed by a 4D camera.

Fig.2-1 Geometry in 3D Subspace

Fig.2-2: Image from 4D Camera

Reference

mathworld.wolfram.com, Regular Tetrahedron

Two Reflections Form A Single Rotation

Posted on 2021-01-28 Edited on 2022-03-22 Disqus:

The proof ^[1] of the title will not be present here. It is used as a conclusion in this article to solve practical computer graphics and robotics problem.

Basic Concepts

Single Rotation

A rotation matrix \(R\) in n-dimensional space: \(\{ R \in \mathbb{R}^{n \times n} | RR ^\top = I, \text{det}(R) = 1\}\). A single rotation matrix is a rotation matrix \(R\) with only one pair of complex eigenvalues.

A single rotation has only one rotation plane \(B\) and rotation angle \(\theta\). The exponential form of a single rotation is \(R = e ^{B \theta}\).

Householder Transform

Householder transform is a famous orthogonal transform in basic linear algebra. For example, QR decomposition use Householder transform to trianglize a matrix.

\[ P = I - 2 v v ^\top \tag{1-1} \]

Here \(P\) is the Householder matrix, \(I\) is the identity matrix, \(v\) is an n-dimensional unit vector used for reflection.

There are some properties of \(P\):

Involutory: \(PP = I\)
Orthogonal: \(P^\top P = I\)
Eigenvalues: \( \lambda_i = \begin{cases} -1 & \text{i = 1}\\ 1 & i = 2, \cdots , n\\ \end{cases}\)
Eigenvectors: \( \begin{cases} e_i = v & i = 1\\ e_i \perp v & i = 2, \cdots , n\\ \end{cases} \)
Determinant: \(\text{det}(P) = -1\)

\(\forall x \in \mathbb{R}^{n}, x = c_1 e_1 + c_2 e_2 + \cdots c_n e_n\). Then,

\[ \begin{align} Px &= \lambda_1 c_1 e_1 + \lambda_2 c_2 e_2 + \cdots \lambda_n c_n e_n \\ &= -c_1 e_1 + c_2 e_2 + \cdots c_n e_n \tag{1-2} \end{align} \]

Formula (1-2) shows that matrix \(P\) reflect a vector \(x\) along vector \(v\).

With the definition of reflection, the title of this article can be expressed as follow:

A single rotation \(R\) in plane \(B\) with rotation angle \(\theta\) can be composed by two reflections \(P_1 = I - 2 u u ^\top, P_2 = I - 2 v v ^\top\), with \(u\) and \(v\) lie in plane \(B\) and form a angle of \(\frac{\theta}{2}\).

Composition of a Single Rotation

An n-dimensional single rotation \(R = e ^{B \theta}\) can be composed by two matrix \(P_1 = I - 2u u ^\top, P_2= I - 2v v ^\top\)

\[ \begin{align} R &= P_1 P_2 \\ \cos \frac{\theta}{2} &= u ^\top v \\ B &= \text{span}\{u, v\} \end{align} \]

Decomposition of a Single Rotation

The decomposition from a single rotation to two reflections are not unique. Any two reflections \(P_1, P_2\) with reflect vector \(u, v\) in rotation plane \(B\) form a angle \(\theta\) can compose a single rotation.

\[ \begin{align} R &= P_1 P_2 \\ &= (I - 2u u ^\top) (I - 2v v ^\top) \\ &= I - 2u u ^\top - 2v v ^\top + 4(u ^\top v) u v ^\top\\ \end{align} \] \[ \frac{R - I}{2} = 2(u ^\top v) u v ^\top - (u u ^\top + v v ^\top) \tag{2-1} \]

Find the Rotation Angle

Because \(\text{tr}(v v ^\top) = \text{tr}(u u ^\top) = 1\), \(\text{tr}(u v ^\top) = u ^\top v\)

\[ \text{tr}\left(\frac{R - I}{2}\right) = \frac{1}{2} \left(\text{tr}(R) - n\right) = 2 (u ^\top v)^2 - 2 \] \[ \begin{align} \cos ^2 \frac{\theta}{2} &= \frac{1}{4} \left(\text{tr}(R) - n\right) + 1 \\ \sin ^2 \frac{\theta}{2} &= -\frac{1}{4} \left(\text{tr}(R) - n\right) \end{align} \]

Because

\[ \cos \theta = \cos ^2 \frac{\theta}{2} - \sin ^2 \frac{\theta}{2} \] \[ \cos \theta = \frac{\text{tr}(R) - n}{2} + 1 \tag{2-2} \]

It provides a way to calculate rotation angle of a n-dimensional single rotation. Specifically, when \(n = 3\), it degenerates to the famous Rodrigous formula.

Find the Rotation Plane

By subtracting (2-1) with its transposition,

\[ \frac{R ^\top - I}{2} = 2(u ^\top v) v u ^\top - (u u ^\top + v v ^\top) \] \[ \frac{1}{4 (u ^\top v)} (R - R ^\top) = u v ^\top - v u ^\top \tag{2-3} \]

Denote \(A = u v ^\top - v u ^\top\), \(A ^\top = -A\) and is therefore a skew symmetric matrix.

Lemma 1. For vectors \(u \in \mathbb{R}^{n}, v \in \mathbb{R}^{n}\). \(u v ^\top - v u ^\top = A\) forms a directional plane \(B\). And for any \(x \in \mathbb{R}^{n}\), applying \(A\) on \(x\) finds a vector \(y\) in plane \(B\) that \(y \perp x\).

Proof: we use \(y\) and \(u\) to form another plane \(B'\) to see whether plane \(B\) and \(B'\) are the same directional plane with different magnitude.

\[ A x = y = u v ^\top x - v u ^\top x \] \[ \begin{align} y u ^\top &= u v ^\top x u ^\top - v u ^\top x u ^\top = (v ^\top x) u u ^\top - (u ^\top x) v u ^\top \\ u y ^\top &= u x^\top v u ^\top - u x^\top u v ^\top = (x ^\top v) u u ^\top - (x ^\top u) u v ^\top \end{align} \] \[ y u ^\top + u y ^\top = (u ^\top x) (u v ^\top - v u ^\top) = k A \]

If \(u\) is not perpendicular to \(x\), \(u ^\top x = k \neq 0\). Therefore the planes generated by \(y u ^\top + u y ^\top\) and \(u v ^\top - v u ^\top\) are the same directional plane with different magnitude. So, vector \(y\) lies in plane \(B\).

Besides:

\[ x ^\top y = x ^\top A x \]

Transpose both sides.

\[ \begin{align} x ^\top y &= (x ^\top A x)^\top \\ &= -x ^\top A x \\ \end{align} \] \[ x ^\top y = -(x ^\top y) = 0 \implies x \perp y \]

With Lemma1, we can find two vectors in plane \(B\) by applying \(A\) to two vectors that is not perpendicular to plane \(B\). An identity matrix \(I\) is consisted of \(n\) orthogonal vectors. By apply \(A\) to \(I\), \(A I = A\). Therefore all column vectors of \(A\) lie in the plane \(B\). According to (2-3), \(R - R^\top = 4 u^\top v A = kA\). Thus, pick up two columns of none zero vector of matrix \(R - R^\top\), one can define the rotation plane of single rotation \(R\) when \(R \neq I\).

Find the Direction of Rotation Plane

One rotation angle \(\theta\) and two vectors \(x\) and \(y\) are still not enough to define a single rotation. The ambiguity is the rotation can be either from \(x\) to \(y\) or from \(y\) to \(x\). The rotation direction can be defined as follow:

Lemma2. Given rotation \(R\) in plane \(B\) and vector \(x \in B\). Denote \(y = (R - R^\top)x\). According to Lemma1, \(y\) is also in plane \(B\) and \(y \perp x\), then the direction of rotation is from \(x\) to \(y\) with rotation angle \(\theta \in [0, \pi)\).

Lemma2. will not be proved here. Instead, the following image explains how it works:

Figure 2.1

Because \(x\) is a vector in plane \(B\), the N-Dimensional single rotations is equivalent to a rotation in a 2D plane. The direction of vector \(y\) is the direction of vector \(x\) with \(90^ \circ\) clockwise rotation if \(\theta \in [0, \pi)\). i.e.

\[ (R - R^\top) x = k \exp \left( \frac{\pi}{2 \theta} \ln(R) \right) x \]

where \(k > 0, k \in \mathbb{R}\).

In summary, to fully define the single rotation plane, one can use the column vector with largest norm in matrix \((R - R^\top)\) as vector \(x\), and vector \(y = (R - R^\top)x\).

The Next Step

A general rotation in higher dimensional space (\(n > 3\)) may be consists of multiple single rotations. A method for decomposing a general rotation to several orthogonal single rotations was proposed ^[3]. By combining those methods, developers may be able to solve higher dimensional geometry problem just as in 3D world. Maybe do ray tracing or SLAM in 4D world.

References

Mathoma, Geometric algebra in 2D - Two Reflection is a Rotation, YouTube
Wikipedia, Householder transformation
Richard, Aurélie, et al. "Decomposition of nD-rotations: classification, properties and algorithm." Graphical models 73.6 (2011): 346-353.
Muphrid, "Geometric interpretation to Skew symmetric coefficient matrix." Math stack exchange.

Geometries for N-Dimensional Ray Tracing

Posted on 2021-01-07 Edited on 2022-03-22 Disqus:

Background

Geometries explained in this blog serve the ray_tracing_4d project.

Ray

Consider a ray in N-Dimensional space, a parametric equation of ray can be described as:

\[ r(t) = c + d t \] \[ c \in \mathbb{R}^{n}, d \in \mathbb{R}^{n}, \| d \| = 1, t \in \mathbb{R} \]

Where \(c\) is the origin point of the ray with N-Dimension and \(d\) is the direction of the ray.

C++ implementation: [geometry_ray.h]

Simplex

In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions ^[1].

Intersect N-1-Simplex with N-Ray

An N-1-Simplex in N-Dimensional Space is composed of \(n\) vertices \(u_1, \cdots , u_{n} \in \mathbb{R}^{n}\).

Approach 1. Basis from edges

A vector space \(B \in \mathbb{R}^{n \times (n-1)} \)can be generated from \(u_2 - u_1, u_3 - u_1, \cdots , u_n - u_1 \). One can find the intersection of the ray and the vector space by solve the equation:

\[ c + d t = u_1 + B k \] \[ \left[ \begin{matrix} B & -d \end{matrix} \right] \left[ \begin{matrix} k \\ t \end{matrix} \right] = c - u_1 \] \[ \left[ \begin{matrix} k \\ t \end{matrix} \right] = \left[ \begin{matrix} B & -d \end{matrix} \right]^{-1} (c - u_1) \]

Whether the intersection point is within the simplex can be determined by:

\[ \sum_{i=1}^{n-1} k_i < 1 \ \text{and } 0 \leq k_i \text{ for } i = 1, \cdots , n-1 \]

Approach 2. Barycentric Coordinate System

Directly use \(u_1, \cdots, u_n \) as a basis \(U\) . \(\theta\) is the weighted vector of all vertices.

\[ \begin{cases} c + dt = U \theta \\ \| \theta \| = 1 \\ \end{cases} \]

In matrix form:

\[ \left[ \begin{matrix} U & -d \\ \mathbf{1} & 0 \\ \end{matrix} \right] \left[ \begin{matrix} \theta \\ t \\ \end{matrix} \right] = \left[ \begin{matrix} c \\ 1 \\ \end{matrix} \right] \] \[ \left[ \begin{matrix} \theta \\ t \\ \end{matrix} \right] = \left[ \begin{matrix} U & -d \\ \mathbf{1} & 0 \\ \end{matrix} \right] ^{-1} \left[ \begin{matrix} c \\ 1 \\ \end{matrix} \right] \]

Whether the intersection point is within the simplex can be determined by:

\[ 0 \leq \theta_i \leq 1 \text{ for } i = 1, \cdots , n \]

Connection Between Two Approaches

\[ \begin{cases} \theta_1 &= 1 - \|k\| \\ \theta_{i+1} &= k_i, \text{ for } i = 1, \cdots , n-1\\ \end{cases} \]

Current solution uses approach 1. C++ implementation: [geometry_primitive.h]

Rotation

This blog uses Special Orthogonal Group to deal with N-Dimensional rigidbody rotation problem.

\[ SO(n) = \left\{ R \in \mathbb{R}^{n \times n} \middle| \det R = 1, R^\top R = I \right\} \]

A single rotation in high dimensional space is defined by a rotation plane and a rotation angle. A general rotation can be composed by multiplying several single rotations. Details are explained in a later blog ^[2].

C++ implementation: [full_dimensional_rotation.h]

Pinhole Camera Model

In N-Dimensional Space, a Pinhole Camera projects N-Dimensional points to (N-1) dimensional points. The camera matrix \(K\) is 3D space is:

\[ K = \left[ \begin{matrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{matrix} \right] \]

Where \(f_x, f_y\) are the focal lengths and \(c_x, c_y\) are offsets of principle point.

By extending the matrix to N-Dimension, the model can be described as:

\[ K = \left[ \begin{matrix} \text{diag}(f) & c \\ 0 & 1 \\ \end{matrix} \right] \]

Likely, \(f\) is the focal length vector, \(f \in \mathbb{R}^{n-1}\). \(c\) is the principle offset vector, \(c \in \mathbb{R}^{n-1}\).

Screen Ray Cast

In ray tracing pipeline, pixel color calculation were started by casting rays to the scene. The transform from a pixel coordinate point \(v \in \mathbb{R}^{n-1}\) to ray expression \(r(t)\) is described as follow:

\[ r(t) = p_c + R_c \frac{d_v}{\|d_v\|} t \] \[ d_v = K ^{-1} \left[ \begin{matrix} v \\ 1 \\ \end{matrix} \right] \]

Where \(p_c, R_c\) is the position and orientation of the camera. \(d_v\) is the direction vector of pixel point, which is not normalized. \(K\) is the intrinsic matrix of the N-Dimensional pinhole camera model.

C++ implementation: [model_camera.h]

Axis Aligned Bounding Box(AABB)

An AABB in N-Dimensional Space can be described by either a center point and a half extent or a minimum point \(p\) and a maximum point \(q\) . This blog uses the later description.

\[ p \in \mathbb{R}^n; q \in \mathbb{R}^n; p_i < q_i \text{ for}\ i = 1, \cdots, n \]

The open region \(B\) described by the bounding box is

\[ B = \left\{x \in \mathbb{R}^{n} \middle| p_i <x_i < q_i \text{ for}\ i = 1, \cdots, n \right\} \]

Intersect with N-Ray

For a specific axis \(i\), the inequality equation of the intersection problem is:

\[ p_i < c_i + d_i t < q_i \]

The solution in axis \(i\) is:

\[ T = \left\{ t \in \mathbb{R} \middle| t \in \left(\frac{p_i - c_i}{d_i}, \frac{q_i - c_i}{d_i} \right) \right\} \]

Specifically, when \(d_i = 0\), there are three cases:

\(c_i \leqslant p_i < q_i \Leftrightarrow t \in (+\infty , +\infty ) \Leftrightarrow T = \emptyset \)
\(p_i < c_i < q_i \Leftrightarrow t \in (-\infty , +\infty ) \Leftrightarrow T = \mathbb{U} \)
\( p_i < q_i \leqslant c_i \Leftrightarrow t \in (-\infty , -\infty ) \Leftrightarrow T = \emptyset \)

Consider all axis, the solution is:

\[ T = \left\{ t \in \mathbb{R} \middle| t \in \bigcap_{i=1}^{n} \left(\frac{p_i - c_i}{d_i}, \frac{q_i - c_i}{d_i} \right) \right\} \]

C++ implementation: [spatial_aabb.h]

Bounding Volume Hierarchy(BVH) Tree

N-Dimensional BVH Tree preserves the logic in low dimensional space. Details will not be explained here.

C++ implementation: [spatial_bvh.h, spatial_bvh_inl.h ]

References

en.wikipedia.org, Simplex
Xiaoxing Chen, Two Reflections Form A Single Rotation

Particle Filter with GPU Implementation

Posted on 2020-10-04 Edited on 2022-03-22 Disqus:

Document

A toy example helps understand OpenCL and particle filter [source code].

Notations

state vector: \(\mathrm{x} \in \mathfrak{se}(2)\)
motion noise: \(\mathrm{w}\)
motion measurement: \(\mathrm{v}\)
observation: \(\mathrm{y}\)
motion model: \(\mathrm{x}_k = f(\mathrm{x}_{k-1}, \mathrm{v}_k, \mathrm{w}_k)\), \(k = 1, \dots, K\)
observation model: \(\mathrm{y}_k = g(\mathrm{x}_k, \mathrm{n}_k)\), \(k = 0, \dots, K\)
Number of particles: \(N\)
Weight of particles: \(\omega\)

Estimation

1. Prediction

Unlike steps in the textbook^[1], I do prediction before sampling particles.

\[ \hat{\mathrm{x}}_{k} = f(\mathrm{x}_{k-1}, \mathrm{v}_k, \mathrm{w}_{k}) \tag{2-1} \]

2. Particle sampling

\[ \left[ \begin{matrix} \check{\mathrm{x}}_{k,m} \\ \mathrm{w}_{k,m} \end{matrix} \right] \leftarrow p(\hat{\mathrm{x}}_{k}) p(\mathrm{w}_{k-1}) \tag{2-2} \]

3. Likelihood calculation

Here I use the likelihood between the simulated LiDAR measurement to actual LiDAR sensor data as the particle weight.

\[ \omega_{k,m} \propto p(y_k | \check{\mathrm{x}}_{k,m}) = p(y_k | \check{\mathrm{y}}_{k,m}) \tag{2-3} \]

LiDAR measurement model:

\[ p(y_k|\check{\mathrm{y}}_{k,m}) = \begin{cases} \text{PDF}(y_k - \check{\mathrm{y}}_{k,m}) && \text{ if } \ y_k, \check{\mathrm{y}}_{k,m} \le L_{max}), \\ 1 - \text{CDF}(y_k - \check{\mathrm{y}}_{k,m}) && \text{ if } \ y_k > L_{max} \text{ or } \check{\mathrm{y}}_{k,m} > L_{max}, \\ \epsilon && \text{else} \end{cases} \tag{2-4} \]

\(L_{max}\) means the maximum measure distance of the LiDAR sensor. \(\text{PDF}\) is the probabilistic distribution function of the sensor model, a gaussian model is used in this project. \(\text{CDF}\) is the cumulative distribution function of the sensor model. \(epsilon\) is a constant small value in code implementation which minimizes the impact of overranged LiDAR beam.

4. Resampling

I use the method in this paper^[2] to do resampling. Leetcode 528 is also helpful for understanding.

Parallelization

In this project, I use line segments to represent objects in map without any acceleration structure. For a single line LiDAR with resolution \(H\), working in a map with \(S\) segments, a particle filter with \(N\) particles has to calculate line intersection for with complexity \(O(H \times S \times N)\). While those calculations are totally independent, it is available to do parallelization. The determination of line segment intersection is presented here^[4].

After eliminating the LiDAR beams that never intersect with objects, the rest beams' measure distances were calculated. Parallel reduction was used to find the "close hit" among all intersections of a single LiDAR beam.

Besides, the likelihood between simulation and actual measurement are also calculated in GPU. To avoid float point overflow, the calculation was done in logarithmic space. For there is no built-in cumulative gaussian function in OpenCL, it is approximated by error function:

\[ \text{CDF}(x) = \frac{1}{2} \left[ 1 + \text(erf)(\frac{x - \mu}{\sigma \sqrt 2}) \right] \]

Experiments

Ice World

Fig.1 - LiDAR resolution: 32, LiDAR measure distance: 50 meters, particle number: 512.

Featureless Corridor

Fig.2 - LiDAR measure distance: 20 meters. As shown in the image, particle filter can easily lost in feature less scenario. The covariance in longitudinal direction increase drastically while in lateral direction it is till stable. The localization converge immediately when LiDAR beam detects corner.

Circular Corridor

Fig.3 - The localization is stable the LiDAR beam matches well to the wall. However the estimated position deviated from the actual position. In this situation, one dimension of the localization is unobservable(need confirm).

References

Barfoot, Timothy D. "State estimation for robotics." Cambridge University Press, 2017.
Terejanu, Gabriel A. "Tutorial on Monte Carlo Techniques." Department of Computer Science and Engineering. University at Buffalo (2009).
Atsushi, Sakai. "PythonRobotics."
Xiaoxing, Chen. "Determine Intersection of Two Planar Line Segments"

Least L1-norm Problem

Posted on 2020-05-07 Edited on 2022-03-22 Disqus:

Background

This article came from a Leetcode problem: 462. Minimum Moves to Equal Array Elements II. For the sake of reading experience, I paste the problem description here:

Given a non-empty integer array, find the minimum number of moves required to make all array elements equal, where a move is incrementing a selected element by 1 or decrementing a selected element by 1.

You may assume the array's length is at most 10,000.

Example:
Input:
        [1,2,3]
        
        Output:
        2
        
        Explanation:
        Only two moves are needed (remember each move increments or decrements one element):
        
        [1,2,3]  =>  [2,2,3]  =>  [2,2,2]
        

My first intuition was to sum the distances between each element and the mean(average) of the list, which failed to pass the test. Here we discuss the problem in a mathematical way.

Problem Statement

The description:

find the minimum number of moves required to make all array elements equal.

means we need to find a \(m\) which satisfies:

\[ \arg\min_m \sum_{i=1}^n |x_i - m| \tag{1-1} \]

\(x\) is an \(n\) dimensional vector.

In other words, the \(m\) we need is to minimize the L1-norm of the residual. While the average of a list minimize the L2-norm of the residual.

Derivation

In order to find \(m\), we can borrow the idea of how the average \(\mu\) was found:

\[ \arg\min_\mu (x - \mu)^\top(x - \mu) \tag{2-1} \]

The minimum can be found by setting its derivative to 0:

\[ \frac{\text{d}}{\text{d}\mu} (x - \mu)^\top(x - \mu) = \frac{\text{d}}{\text{d}\mu} (x^\top x - 2\mu\sum_{i=1}^n x_i + n\mu ^2) = 0 \tag{2-2} \] \[ 2n\mu - 2\sum_{i=1}^n x_i = 0 \] \[ \mu = \frac{\sum_{i=1}^n x_i}{n} \tag{2-3} \]

We write formulation (1-1) as:

\[ \arg\min_m \sum_{i=1}^n \sqrt{(x_i - m)^2} \tag{2-4} \]

Set the derivative to 0:

\[ \frac{\text{d}}{\text{d}m} \sum_{i=1}^n \sqrt{(x_i - m)^2} = \sum_{i=1}^n \frac{x_i - m}{\sqrt{(x_i - m)^2}} \\ = \sum_{i=1}^n \text{sign} (x_i - m) = 0 \tag{2-5} \]

Formula (2-5) indicates that when \(m\) lies on the coordinate such that the number of \(x_i\) greater than \(m\) equals to the number of \(x_i\) less than \(m\), \(m\) minimizes the L1-norm of the residual. It is worth noting that there are more than one solution. The median of the vector \(x\) can be a solution.

Given 10 dimensional vector \(x\), plot \(m\) with respect to the L2-norm of residual. It can be seen that the \(m\) minimizing L1-norm is not unique. [source code]

Summary

A clean solution of the leetcode problem can be:

    class Solution:
        def minMoves2(self, nums: List[int]) -> int:
            nums.sort()
            m = nums[len(nums) // 2]
            return sum([abs(n - m) for n in nums])

References

log0, Differences between the L1-norm and the L2-norm (Least Absolute Deviations and Least Squares)
hattenn, Royi The Median Minimizes the Sum of Absolute Deviations (The L1 Norm) StackExchange

Solve Epipolar Constraint by Optimization over Unit Sphere Manifold

Posted on 2020-03-14 Edited on 2022-03-22 Disqus:

Motivation

While estimating the transform between two images, one has to solve the Essential Matrix according to Epipolar Constraint:

\[ x_2^\top E x_1 = 0 \tag{1-1} \]

Five-point-algorithm and eight-point-algorithm are both applicable. Here I take eight-point algorithm as example. The essential matrix is vectorized, and equation (1-1) is transformed to a homogeneous equation in form \(Ax = 0\). The homogeneous equation can be solved by SVD, and the essential matrix will be recovered from vector \(x\).

However, according to the definition of essential matrix:

\[ E = \{ S R | S = t^\wedge \in \mathfrak{so}(3), R \in SO(3)\} \tag{1-2} \]

Singular values of a essential matrix are \([1,1,0]\) ^[1]. In order to make the solution "essential", a general solution^[2] is to SVD the result and replace its singular value with \([1,1,0]\). The matrix \(\hat E\) is the projection of matrix \(E\) on essential manifold. By decomposing motion from essential matrix, one will get four solutions. Only one of the four solutions is available, whose feature points have position depth in camera. Finally, the rotation matrix and translation vector are recovered.

\[ \hat E = U_E \text{diag}(1,1,0) V_E^\top \tag{1-3} \]

In this article, I am going to implement an optimization based method to find the essential matrix.

Problem Formulation

Parameterization

Few papers talk about properties of essential manifold. Let's see what will happen if I solve the problem with only constraint the rotation matrix. I parameterize the essential matrix to a 6-dimensional vector: 3-dimensional translation vector and 3-dimension rotation. But it was denied after I found the equation will always converge at the point where translation vector \(t = [0, 0, 0]^\top\). This is where "pure rotation" happens, the constraint "degenerated". Therefore, I finally added a constraint \(\|t\| = 1\), to restrict the translation vector on a unit sphere. If I can turn the unit sphere manifold to a Lie Group, the the solution can be solve as what one do to the rotation part. Unluckily, the answer was "NO": \(S^2\) manifold cannot be given a Lie Group structure.

Anyway, the idea of constrained optimization is to iterate the state vector without breaking the constraint. The translation vector on unit sphere has degree of 2, the essential matrix was then vectorized to \(E = [\tau_x, \tau_y, \psi_x, \psi_y, \psi_z]^\top\). Every step I calculate the jacobian of the translation vector in two orthogonal directions. In order to preserve the incremental direction, the translation vector is stored in a orthonormal matrix \(T = [t_x, t_y, t_z], T \in SO(3)\). The first two columns of the matrix \(t_x, t_y\) indicates the incremental directions, the last column \(t_z\) is the vector itself.

Cost Function

Residual:

\[ f(x) = x_1^\top t_z^\wedge R x_2 \tag{2-1} \]

The problem then comes to:

\[ \text{arg}\ \text{min} F(t_z, R) = \frac{1}{2} \sum_{i = 1} ^ n \|f(t_z, R)\|^2 = \frac{1}{2} \sum_{i = 1} ^ n \left( (x_{1,i}^\top t_z^\wedge R x_{2,i})^\top (x_{1,i}^\top t_z^\wedge R x_{2,i}) \right) \tag{2-2} \]

Jacobian

I use numerical method to calculate the jacobian.

\[ J = \left[ \begin{matrix} \frac{\partial f}{\partial \tau_x} & \frac{\partial f}{\partial \tau_y} & \frac{\partial f}{\partial \psi_x} & \frac{\partial f}{\partial \psi_y} & \frac{\partial f}{\partial \psi_z} \end{matrix} \right] \tag{2-3} \]

Update

\[ \begin{cases} \Delta \tau = [\Delta \tau_x, \Delta \tau_y, 0]^\top\\ T_k = \text{exp} (\Delta \tau ^ \wedge) T_{k-1} \\ t_{z,k} = T_{k[:,3]} \\ R_k = \text{exp} (\Delta \psi ^ \wedge) R_{k-1} \end{cases} \tag{2-4} \]

Simulation

Two cameras were set at position \((0.5, -0.8, 0.5)m\) and \((0.6, -0.8, 0.5)m\), their orientations were presented in rotation vectors: \((-\frac{\pi}{2}, 0, 0)rad\), \((-\frac{\pi}{2}, 0.1, 0)rad\). Totally 60 points were generated.

Fig.1 - Generate point cloud in 3D space, the red arrows indicate the pose of cameras.

Fig.2 - Project points to normalized camera coordinate. The point IDs are known to algorithm so that there is no outliers in matching. [source code]

The Gauss-Newton method was used to calculate the update. Given initial value \(t = (\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}, 0)\), \(\psi = (0, -0.1, 0)\) The cost converged within 10 iterations.


        it: 0, cost: 32027813.76362674
        it: 1, cost: 77838.5880822376
        it: 2, cost: 6281.828095169881
        it: 3, cost: 112.93046933952343
        it: 4, cost: 6.927671506086043
        it: 5, cost: 1.0246290629301655e-07
        it: 6, cost: 1.2263197696799394e-15
        it: 7, cost: 3.9916308419887783e-25
        it: 8, cost: 4.27369517787316e-25
        it: 9, cost: 2.6192914664154937e-25
        t: [ 9.9984608e-01 -7.6998377e-17 -1.7341515e-16], r: [-0.0011568  0.0636558  0.0636558]
        groundtruth: t01: [0.1 0.  0. ], r01: [-0.0011568  0.0636558  0.0636558]

Fig.3 - Cost in iteration.

Conclusion

Compared with eight-point-algorithm, there is no "linear space to manifold" projection required in optimization based method. Every iteration was restricted on manifold. Besides, optimization method does not need essential matrix decomposition. The optimization targets are rotation and translation direction. However, with different iteration initial values, the method may converge to any of the four possible solutions. Developer has to recover four solutions from the converged one and verify all of them. The accuracy of both methods are reliable, errors come from numerical computation.

References

Xiaoxing. Singular Values of Essential Matrix
Zhang, Zhengyou. "Determining the epipolar geometry and its uncertainty: A review." International journal of computer vision 27.2 (1998): 161-195.

B-Spline in Space I - Euclidean Space

Posted on 2020-03-02 Edited on 2022-03-22 Disqus:

Motivation

In a robot system, tens of sensors collecting data in different frequencies and phases. When intermediate values between samples are required, linear interpolation can be used to approximate these values according to their surrounding samples. In some cases, however, linear interpolation is not qualified due to its lack of continuity in high order derivatives.

Moreover, in robotics, computer graphics and computer vision, physical values are not only expressed in Euclidean Space. The orientation of a digital object, the pose of a automatic robot, are described by Quaternion or Lie Group. Some of the readers may have heard about Quaternion SLERP (Spherical Linear Interpolation), which is a general method used in Quaternion interpolation. But SLERP only keeps the intermediate quaternion unit, the derivative continuity before and after a knot is not guaranteed.

With but not limited to the reasons above, B-Spline is widely used in industry and research. Here is a introduction to the principle of B-Spline, both in Euclidean Space and Lie Group.

Terminologies of B-Spline

Terminologies	Notation	Explanation
Independent Variable	\(t\)	In state estimation, the independent variable is time \(t\).
Order	\(n\)	Same as the order of final polynomial, the highest exponent.For instance, \(f(x) = 3x^3\) has an order of 4.
Degree	\(p\)	Same as the degree of final polynomial, the highest exponent.For instance, \(f(x) = 3x^3\) has a degree of 3.
Segment Index	\(i\)	A B-Spline is combined with piecewise polynomial, each piece is a segment.
Knot Vector	\(k\)	The connect knot, or joint, between every piece of polynomial segment. Knot vector lies in the axis of independent variable \(t\).
Basis Function	\(B_{i,p}(t)\)	In the expression, \(i\) is the index of polynomial segment, \(p\) is the degree of the basis function.
Control Points	\(C_i\)	Also named as coefficients. Control points are used to describe the weight of each piece of polynomial segment.

Intuitive Understanding from Figures

Fig.1 - Iterative composition of basis functions. Each segment of basis function \(B_{i,p}(t)\) is composed by its adjacent basis functions in lower degree, which are \(B_{i,p-1}(t)\) and \(B_{i + 1,p-1}(t)\). While basis functions in degree 0 are unit square wave signals. [source code]

Fig.2 - The point \(P_{0,2}\) of basis function \(B_{0,2}(t)\) (degree 2 and segment 0), is evaluated by the weighted combination of \(B_{0,1}(t)\) and \(B_{1,1}(t)\). The weight of each component is defined by the distance between \(P_{0,2}\) and its correspondent knot points. As shown in the figure, the knot vector is \([0, 3, 6, 9]^\top\). [source code]

Fig.3 - A B-Spline with degree 2, named \(S_2(t)\) is generated by weighted combinations of all its basis functions. The weight comes from control points, which are \([1.2, 0.8, 1.1, 1.2, 1.1]\) in the figure. The brown dash lines show how control point affect the weight of basis function, these lines finally compose the brown spline. The knot vector is \([0, 1, 2, 3, 4, 5, 6, 7]^\top\). [source code]

Cox-de Boor Recursion Formula

A mathematical representation of B-Spline is necessary after the intuitive concept.

Let \(K\) be a set of \(m\) non-decreasing numbers, \(k_0 \le k_1 \le \cdots \le k_{m-1} \). The \(k_i\)s are called knots, the set \(K\) the knot vector, and the half-open interval \([k_i, k_{i+1})\) the i-th knot span. Note that since some \(k_i\)s may be equal, some knot spans may not exist. If the knots are equally spaced (i.e., \(k_{i+1} - k_i\) is a constant for \((0 \le i \le m - 1)\), the knot vector or the knot sequence is said uniform; otherwise, it is non-uniform^[1].

The i-th B-spline basis function of degree \(p\), written as \(B_{i,p}(t)\), is defined recursively as follows:

\[ B_{i,0}(t) = \begin{cases} 1 & \text{if} \ \ t_i \le t < t_{i+1} \\ 0 & \text{otherwise} \end{cases} \tag{2-1} \] \[ B_{i,p}(t) = \frac{t - t_i}{t_{i+p} - t_i} B_{i,p-1}(t) + \frac{t_{i+p+1} - t}{t_{i+p+1} - t_{i+1}} B_{i+1,p-1}(t) \tag{2-2} \]

References

Shene, B-spline Basis Functions: Definition
Kim, M.J., Kim, M.S. and Shin, S.Y., 1995, September. A general construction scheme for unit quaternion curves with simple high order derivatives. In Proceedings of the 22nd annual conference on Computer graphics and interactive techniques (pp. 369-376).
Qin, Kaihuai, General matrix representations for B-splines , The Visual Computer (2000) 16:177–186

Why Here

Posted on 2020-02-29 Edited on 2022-03-22 Disqus:

The original blog locates at CSDN XiaoxingChen's Blog. Due to its terrible support to mathematical equation and version control, I turned to log my daily thinkings and notes in private git repositories. So far I start to make some of the contents public, here is the new site for them.