Exploring the Transformation Matrix

Transforming shapes in 2D graphics can be a complex task, involving multiple operations such as scaling, rotation, and translation. However, by harnessing the power of transformation matrices, we can simplify these processes. In this blog post, we’ll explore the concept of transformation matrices and how they revolutionize shape manipulation. Get ready to unleash the potential of transformation matrices and take your 2D graphics to new heights.

Table of contents

Introduction
- Shapes are just set of points
The First Attemp
Can we do better?
The transformation matrix
Perspective Matrix
- Create the perspective matrix
- Find a point

Introduction

Imagine you have a shape within a scene, whether it’s a simple rectangle or a more intricate design like a star. Now, suppose you want to make changes to this shape by applying various transformations. For example, you might want to rotate the shape to a different angle, resize it to make it larger or smaller, or move it to a new position within the scene. These transformations allow you to alter the shape’s appearance and location, adding versatility and flexibility to your visual compositions.

in your code you could write something like this to define each transformation component.

let scaleX = 1;
let scaleY = 1;
let rotation = 0;
let x = 0;
let y = 0;

// Translation
element.x += x;
element.y += y;

/// Scaling
element.width *= scaleX;
element.height *= scaleY;

// Rotation
element.transform = "rotate(rotation)";

Shapes are just set of points

Primitive shapes are typically a collection of points offered by platforms or programming languages to facilitate simplicity. For instance, a rectangle or square can be defined by connecting four points.

In contrast, a star shape consists a central point from which multiple lines or curves extend outward, while a triangle is formed by three points. By incorporating additional points, various shapes can be created.

Thus, in this article, our focus will be on transforming individual points rather than entire shapes. This approach aligns with the principles of mathematics.

SVG Usage

To ensure simplicity in the examples, SVG Paths will be used for visual representation. but the underlying concept applies to any set of points.

First Attemp

How would you scale a line by a factor of 2x and rotate it by 45 degrees from its origin, and move it by 50 units on x axis and 50 units on y axis.

Origin Point

origin point typically refers to the point (0, 0) on a coordinate system. it’s the starting point from which measuring distances and determining the positions of other points are taken along the x and y axes.

const lineStart = { x: 0, y: 0 };
const lineEnd = { x: 100, y: 100 };

Scaling

scaling a point can be done by multiplying the scale factor by the point.

lineStart = { x: 0 * 2, y: 0 * 2 };
lineEnd = { x: 100 * 2, y: 100 * 2 };

Rotation

The rotation affects the element orientation, unlike the translation and scale operations, the rotation requires the use of sine and cosine functions to calculate new positon of element from origin.

Rotation formula

x’ = x cos(θ) - y sin(θ)

y’ = x sin(θ) + y cos(θ)

Wikipedia

Converting the equation formulas into JavaScript code leads to…

const theta = 45 * (Math.PI / 180);
const cos = Math.cos(theta);
const sin = Math.sin(theta);
lineStart = { x: 0 * cos - 0 * sin, y: 0 * sin + 0 * cos };
lineEnd = { x: 100 * cos - 100 * sin, y: 100 * sin + 100 * cos };

Translation

You can achieve this by adding the mount of movement to the point position

lineStart = { x: 0 + 50, y: 0 + 50 };
lineEnd = { x: 100 + 50, y: 100 + 50 };

and by combining the operations all together, you get this

Apply the same on square

Transformation from specific origin

To apply a transformation based on a specific origin, the following steps are necessary:

Firstly, translate the point to the desired origin point.
Next, apply the desired transformation.
Finally, translate the point back to its original position.

This example applies rotation from center

Operations order Matters

The order in which transformation operations are applied does matter. When multiple transformations, such as rotation, scaling, and translation, are involved, the sequence in which they are performed can produce different outcomes. The transformations should be applied in the desired order to achieve the intended result.

For instance, rotating an object and then translating it will yield a different outcome compared to translating it first and then rotating it.

Is this approach scalable?

Although this approach may be functional, it does come with several significant drawbacks.

Increased complexity: Managing multiple variables for each transformation parameter can lead to increased complexity in your code. It requires you to track and synchronize the values of these variables, potentially leading to more code and increased chances of errors.
Code duplication: there is a risk of code duplication. If you have multiple objects or entities that require transformations, you may end up duplicating similar code logic for each object. This can lead to larger codebases and increased maintenance overhead.
Lack of compositionality: When using separate variables, it becomes more challenging to compose multiple transformations together. You need to manually apply each transformation in the correct order and handle potential dependencies between transformations. This can result in less modular and less reusable code, making it harder to maintain and modify.

Can we do better?

Absolutely! Let’s talk about the Transformation Matrix, a powerful tool that has been around for over a hundred years, even before computers were a thing. It plays a vital role in implementing graphics operations.

The Transformation Matrix is a mathematical concept that allows us to manipulate and transform objects.

So, what makes the Transformation Matrix so special? Well, it helps us simplify complex transformations by composing them into a single equation. This makes our code neater and more efficient, giving us greater control over objects.

Let’s code a transformation matrix.

Think of the matrix as a table where each number represents a specific piece of information related to a transformation. in a 2D Transformation Matrix, we have a 3x3 matrix with elements representing scaling factors, rotation angles, and translation distances.

Translation Matrix

const matrix = [
  [1, 0, tx],
  [0, 1, ty],
  [0, 0, 1],
];

Scaling Matrix

const matrix = [
  [sx, 0, 0],
  [0, sy, 0],
  [0, 0, 1],
];

Rotation Matrix

const matrix = [
  [cos(θ), -sin(θ), 0],
  [sin(θ), cos(θ), 0],
  [0, 0, 1],
];

To improve readability, we can exract the previous definitions into their own functions, so we don’t write the whole matrix every time we use it.

const rotate = (angle) => [
  [Math.cos(angle), -Math.sin(angle), 0],
  [Math.sin(angle), Math.cos(angle), 0],
  [0, 0, 1],
];

const translate = (tx, ty) => [
  [1, 0, tx],
  [0, 1, ty],
  [0, 0, 1],
];

const scale = (sx, sy) => [
  [sx, 0, 0],
  [0, sy, 0],
  [0, 0, 1],
];

You know, by having three matrices, we’re essentially creating a problem instead of finding a solution. So, we need to compose these matrices into a single matrix that can be used in any context and at any time.

Homogeneous coordinates

In 2D transformations, a 2x2 matrix is sufficient to represent scale and rotation transformations. But, when we introduce translation into the transformation, a 3x3 matrix is required. The reason for this lies in the concept of Homogeneous Coordinates.

Homogeneous coordinates involve introducing an additional dimension (usually represented as Z) to the matrix to account for translation. Although this additional row vector is not directly used in 2D transformations themselves, it is necessary for proper matrix composition.

Composing Multiple Operations

To compose mulitple transformations into single transformation matrix, we need to use Matrix Multiplication, it’s simply achieved by multiplying each value in a row of the first matrix with each value in a column of the second matrix. This means that the resulting element in the product matrix is obtained by taking the dot product of the corresponding row and column.

To calculate the elements of the resulting matrix, you should perform the dot product of each row of the first matrix with the corresponding column of the second matrix. Each dot product will yield one element of the resulting matrix. Repeat this process for all elements in the resulting matrix, ensuring the dimensions and order of operations are maintained.

Dot Product

The dot product , also known as the inner product or scalar product, calculates the dot product of two vectors. For two vectors A and B of the same dimension (n), the dot product is calculated as follows: A · B = A₁B₁ + A₂B₂ + … + AₙBₙ

Vector

A vector, can be either a column or a row of a matrix.

have a look at the following function that implements two matrices multiplication.

const multiply = (matrix1, matrix2) => {
  const result = [];
  cost rows = matrix1.length;
  const cols = matrix2[0].length;

  for (const i = 0; i < rows; i++) {
    result[i] = [];
    for (const j = 0; j < cols; j++) {
      let sum = 0;
      for (const k = 0; k < cols; k++) {
        sum += matrix1[i][k] * matrix2[k][j];
      }
      result[i][j] = sum;
    }
  }

  return result;
};

or you can rewrite it to multiply an infinite number of matrices using the ES6 fancy syntax

const dotProduct = (vec1, vec2) =>
  vec1.map((_, i) => vec1[i] * vec2[i]).reduce((m, n) => m + n);

// flip matrix rows with columns
const transpose = (a) => a[0].map((_, i) => a.map((y) => y[i]));

const multiply = (...matrices) =>
  matrices.reduce((matrix1, matrix2) => {
    matrix2 = transpose(matrix2);
    return matrix1.map((x) => matrix2.map((y) => dotProduct(x, y)));
  });

What happened there?

the multiply function will take any number of matrices and return single matrix that contains all the operations to be used later.

we basically looped over all rows in the first matrix, applied the dot product with the columns on the other matrix, resulting in a single matrix.

notice that in the second approach we needed to trapsonse the second matrix so it’s easier to apply dot product with row to row instead of row to column inside the map function.

Matrix Transpose

The transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix A by producing another matrix.

wikipedia

It’s important to note again that the transformation operations are not commutative, meaning the order in which they are applied can produce different results

Translate, Rotate, Scale

Translate, Scale, Rotate

Scale, Rotate, Translate

Around the origin

to transform (scale/rotate) around a given point, you need first to translate to that point, multiply transformation then translate back the point

for example if you want to rotate a square that is positioned on (0,0) and has width and height of 100 by 45 degrees around it’s center, you can do the following

const matrix = multiply(
  translate(width / 2, height / 2),
  rotate(45 * (Math.PI / 180)),
  translate(-width / 2, -height / 2)
);

and for sure, you can scale by 2 from center as well:

const matrix = multiply(
  translate(width / 2, height / 2),
  rotate(45 * (Math.PI / 180)),
  scale(2, 2),
  translate(-width / 2, -height / 2)
);

The Second Attemp

How would you scale a line by a factor of 2x and rotate it by 45 degrees from its origin, and move it by 50 units on x axis and 50 units on y axis, using transformation matrix?

we first start translation, rotation and last scale.

const matrix = multiply(
  translate(50, 50)
  scale(2, 2),
  rotate(45 * (Math.PI / 180)),
);

Rotation angle

That rotation transformations uses angles that are measured in radians.

radians = degrees * (π / 180)

it’s important to have the correct order of operations to avoid unwanted results.

so we start by moving the point to the new position, then we rotate , and lastly we scale it, that’s because if you rotate rectangle first and then scale it, the rotation will occur around the origin of the coordinate system, and then the scaling will occur along the axes of the rotated object. This means the object rotates in place and then expands.

on the other side, If you scale the rectangle first and then rotate it, the scaling will happen along the original axes of the object, and the rotation will occur around the origin of the coordinate system, taking the scaled object as a whole, As a result, your rotation is no longer occurring in a standard, “unscaled” coordinate system, and the result can look like a shear transformation because the rotation is applied with respect to the differentially scaled axes.

since we are using uniform scaling, the shape will resize accordingly along different axes, but using non-uniform scaling factors, shearing effect will occure

Rotate, Scale

Scale, Rotate

Finding the new point

now after we calculated the transformation matrix of all operations, we can apply it to find any point in space.

given this general form of transformation matrix.

  [ a  b  tx ]
  [ c  d  ty ]
  [ 0  0   1 ]

The affine transformation formula will be:

x' = a*x + b*y + tx
y' = c*x + d*y + ty

the previous formluas can be converted to javascript to be like.

const x_ = matrix[0][0] * x + matrix[0][1] * y + matrix[0][2];
const y_ = matrix[1][0] * x + matrix[1][1] * y + matrix[1][2];

in this example, we use the affine transformation formula to find the transformed point when you hover over the gray shape.

Affine transformation

an affine transformation is a type transformation matrices that preserves lines and parallelism. That means if you have a set of points that form a line, and you apply an affine transformation to these points, they will still form a line after the transformation. Also, if you had two lines that were parallel before the transformation, they will stay parallel after the transformation.

Shearing

Imagine you have a square.
Now, if you hold the bottom line steady and push the top line to the right or left, what you’re left with is a parallelogram. This is the effect of a shear transformation.

Shearing formula

The formula for shearing can be represented as follows:

x’ = x + shearX * y

y’ = y + shearY * x

and using the shearing matrix, can be represented as

const shear = (rx, ry) => [
  [1, rx, 0],
  [ry, 1, 0],
  [0, 0, 1],
];

Applying matrix to whole shapes

The process of transforming each point of a shape individually can be quite tedious. Therefore, most programming languages offer built-in support for applying transformation matrices to entire shapes at once. Examples of this feature can be found in the CSS ‘transform’ property, the SVG ‘transform’ attribute, and the Canvas API.

This is the general form for transformation matrix

  [ a  b  tx ]
  [ c  d  ty ]
  [ 0  0   1 ]

I think you can figure out the rest.

transform: matrix(a, b, c, d, tx, ty);

transform="matrix(<a> <b> <c> <d> <tx> <ty>)"

ctx.transform(a, b, c, d, tx, ty);

Matrix Inversion

Let’s say you have a point with coordinates (x, y), and this point has undergone a transformation. If you’re tasked with figuring out the original position of the point before the transformation was applied, how would you go about it?

Well, by finding the inverse matrix, we can reverse the effects of the transformation and determine the original position of the point. and by using the affine transformation formula on the inversed matrix, the resulting coordinates will represent the original location of the point before the transformation was applied.

Essentially, the inverse matrix is a standard matrix that, when multiplied with the original matrix, yields the identity matrix.

The Identity Matrix

An identity matrix is like the number 1 in multiplication. Just like when you multiply any number by 1, you get the same number back, when you multiply any matrix by the identity matrix, you get the original matrix back.

The identity matrix has a special pattern. It’s a square grid filled with zeroes, with a line of ones going diagonally from the top left to the bottom right.

const identity = [
  [1, 0, 0],
  [0, 1, 0],
  [0, 0, 1],
];

in this example, if you hover over the red shape, you will see the original point before transformation on the gray shape.

How to find the inverse of matrix?

The affine transformation is a composition of rotation, scale, shear, and translation transformations

  [ a  b  tx ]
  [ c  d  ty ]
  [ 0  0   1 ]

Where [a,b,c,d] is the 2x2 matrix representing linear transformations (like rotation, scaling, and shearing) and [tx,ty] is the translation vector.

To invert an affine transformation, you first invert the linear part ([a,b,c,d]), and then apply this inverted linear part to the translation part ([tx, ty]).

so we start by computing the determinant of the linear part ([a,b,c,d])

const determinant = a * d - b * c;

and then we invert the linear part of (scale, rotation and shear)

const inversedA = d / determinant;
const inversedB = -b / determinant;
const inversedC = -c / determinant;
const inversedD = a / determinant;

What happened there?

we simply divided the Adjugate matrix by the original matrix determinant.

What is the determinant?

The determinant is a special number that we can associate with a square matrix, It is like a signature of the matrix that can reveal certain properties like whether the matrix has an inverse or not.

The matrix is not invertible

there are many cases where a matrix is not invertable, this can be found when the determinant is zero, like when you scale by zero.

and finally, we apply the linear part to the translation

const inversedTx = (b * ty - d * tx) / determinant;
const inversedTy = (c * tx - a * ty) / determinant;

You can’t simply use the negative values of the translation vector to inverse it, This is because the inversion process depends not only on the translation component but also on other matrix components such as scale and rotation.

Perspective Matrix

In the context of affine transformations, the preservation of lines and parallelism is the main characteristic. However, there are situations where you may want break this concept and introduce depth and a sense of 3D perspective to certain shapes.

This is where the perspective matrix comes into play and serves a valuable purpose. The perspective matrix allows you to apply transformations that create a realistic 3D perspective effect, adding depth and a sense of distance to objects in your 2D scene.

It’s the time to use the unsed part of our 3x3 matrix.

the 3rd row of our transformation matrix will represent the perspective transformation in the homogeneous coordinate, that is responsible in mapping our 2D points into a new dimension

so our new matrix form will look like this

[
  [a, b, e],
  [c, d, f],
  [j, h, i],
];

The Homography matrix

Its’ also called the Homography matrix. The homography is a transformation that expresses how points from one plane are mapped onto another plane using projective geometry. It retains the ability to perform the simpler affine transformations like translation, rotation, and scaling, but also includes perspective transformations

3x3 VS 4x4 matrix

The Perspective matrix is commonly represented as a 4x4 matrix, where the perspective vector corresponds to the 4th row. To simplify the matrix, We use the Z vector to represent it, thereby saving space on unused rows.

let’s start coding.

given a square of four points, the square bottom right point has shifted down by 20 units, find the perspective matrix that represents the transformation.

const square = [
  [0, 0],
  [100, 0],
  [100, 100],
  [0, 100],
];

const squarePerspective = [
  [0, 0],
  [100, 0],
  [100, 120],
  [0, 100],
];

Now, we need to calculate the perspective matrix that is responsible to map square points to the squarePerspective, you need to solve a system of linear equations 🌚 for

A * h = b

first we need to construct the coefficient matrix (A) 8x8 matrix, this can be obtained using the Direct Linear Transform (DLT) algorithm

const A = square.reduce((acc, square, index) => {
  return [
    ...acc,
    [
      square[0],
      square[1],
      1,
      0,
      0,
      0,
      -square[0] * squarePerspective[index][0],
      -square[1] * squarePerspective[index][0],
    ],
    [
      0,
      0,
      0,
      square[0],
      square[1],
      1,
      -square[0] * squarePerspective[index][1],
      -square[1] * squarePerspective[index][1],
    ],
  ];
}, []);

then we construct h which is column vector of unknown coefficients of squarePerspective points.

const h = squarePerspective.reduce((acc, point) => [...acc, ...point], []);

We can use any math library that supports LU decomposition to slove this equation.

here I’m using mathjs library.

const b = math.lusolve(A, h);

Now, the solution of the lusolve will be a vector column that can be used create our final perspective matrix.

const perspectiveMatrix = [
  [b[0], b[1], b[2]],
  [b[3], b[4], b[5]],
  [b[6], b[7], 1],
];

Find a point using the perspective matrix

Similiar to the affine transformation formula you can find any point using the perspective matrix

w = j * x + h * y + i
x = (a * x + b * y + tx) / w
x = (c * x + d * y + ty)/ w

  const x = matrix[0][0] * point[0] + matrix[0][1] * point[1] + matrix[0][2];
  const y = matrix[1][0] * point[0] + matrix[1][1] * point[1] + matrix[1][2];
  const w = matrix[2][0] * point[0] + matrix[2][1] * point[1] + matrix[2][2];

  return {
    x: x / w
    y: y / w
  }

when Dividing x/w and y/w , you are normalizing the homogeneous coordinate, thus, homogeneous representation will be converted to the Cartesian coordinates (2D plane).

in this example if you hover over the gray shape, you will see the mapped point on the projected shape, and so, if you hover over the blue shape, will you see the mapped point on the original shape, As previously mentioned, you can retrieve the original point by applying the inverse of the matrix.

Perspective matrix Inverse

In contrast to an affine transformation matrix, inverting a perspective matrix involves more steps and requires more mathematical involvement. Consequently, it is more feasible to utilize a library or a pre-existing implementation to achieve the inversion of the perspective matrix.

the this example, I’m using mathjs inv function to inverse the matrix.

And this is how you create a perspective matrix to map points from 2D to achieve 3d depth.

Conclusion

Matrix transformations are powerful tools due to their characteristics including ability to be easily composed, their straightforward syntax, their computational efficiency, and their invertibility, Unlike other techniques, such as manual point manipulation or individual transformation functions, matrices offer a unified approach to handle complex transformations.

Personally, I believe that understanding transformation matrices is essential. It grants me the ability to exert precise control over element positioning and appearance on web pages, while working on some of my side projects that require efficient and straightforward element transformations.