A Simple And Intuitive Explanation of How Gizmos Work In 3D Space

If you've ever used a 3D editor, then you've most likely used a certain thing called "Gizmo", Gizmos are essentially transformation modifiers that's within a world, they let you modify the object's position, orientation and scale. The implementation of a Gizmo is actually fairly straightforward for the most part (or may not be depending on how your application handles things internally, but at the fundamental level it's simple).

This article would only cover Translations and Rotations, Scaling is very easy to implement after understanding how the first two work. And also this may give you a hint into how Blender's robust implementation of Gizmo works as well!

Translation

Now getting straight to translation as our first mode to explain, the important thing to observe here is that one of the primary axis of space can be thought of as lines that extends indefinitely, and any point on the plane that the line is contained within can be projected into the line with a simple dot product like so:

So let's suppose we have a ray that comes from the camera and hits straight into the plane that contains this infinitely long axis line, then we could get the difference between the point that is about to be projected into the line and the position of the object (which is contained within the axis line!) so that we could then move the object by that difference as shown here:

And this is actually a fairly stable method, I'm currently doing this and it snaps it perfectly (If we don't include the floating point errors!)

And believe it or not, that's essentially all there is to translations.

So to do this in practice, you want to somehow be able to interact with one of the primary coordinate axis, one way to achieve this is to cast a ray into an axis-aligned-bounding-box (or an AABB) that covers each axis in a volume like so:

I'm not going to explain the implementation behind the collision code, but if you're interested here's a link to it to keep things short: https://gist.github.com/jakubtomsu/2acd84731d3c2613c91e40c2e064ffe6#file-realtime_collision_detection-odin-L491, you can also use any sort of collider you want (cylinders, regular meshes or whatever).

Now since we would be dragging things around in multiple frames, you need to keep track of the state somehow, one way to achieve this is to create an object that tracks the state of things which also lives throughout the lifetime of your editor so it can be accessed anytime:

Gizmo_State :: struct
{
  transform_mode:  enum {translation, rotation},
  old_intersect:   Vec3, // intersection point from the previous frame
  first_intersect: Vec3, // the first intersection since started translating/rotating an object
  was_holding:     enum {nothing, red, green, blue, all},
}

Here was_holding variable contains which specific translational/rotational axis we were holding throughout the time

Now the first thing you need to be able to do is cast a ray from the camera to the world, how would you do that is up to you, but for me I try to cast from the screen space, I'm not going to explain how this works either, you can learn more on how you can do that here: https://gdbooks.gitbooks.io/3dcollisions/content/Chapter5/picking.html

The code is pretty simple anyways, so here it is:

import "core:math/linalg"

Vec2 :: [2]f32
Vec3 :: [3]f32
Vec4 :: [4]f32

un_project_point_from_view :: proc(mouse_screen_point: Vec2, view, projection: matrix[4,4]f32, x_off, y_off, screen_width, screen_height: int) -> Vec3
{
  normalized_coords := Vec4 {
    0 /* x-axis */ = ((mouse_screen_point.x - cast(f32)x_off) / cast(f32)screen_width)  * 2 - 1,
    1 /* y-axis */ = ((mouse_screen_point.y - cast(f32)y_off) / cast(f32)screen_height) * 2 - 1,
    3 /* w-axis */ = 1.0,
  }

  eye_coords   := linalg.inverse(projection) * normalized_coords
  world_coords := linalg.inverse(view) * eye_coords

  if abs(0.0 - world_coords.w) > math.F32_EPSILON {
    world_coords *= 1.0 / world_coords.w
  }

  return world_coords.xyz
}

// example usage of the `un_project_point_from_view` procedure
trace_ray_into_near_plane :: proc(
  view_matrix:      matrix[4,4]f32,
  persp_matrix:     matrix[4,4]f32,
  screen_width:     int,
  screen_height:    int,
  screen_mouse_pos: [2]f32) -> [3]f32
{
  return un_project_point_from_view(screen_mouse_pos, view_matrix, persp_matrix, 0, 0, screen_width, screen_height)
}

It would collide a ray with the near plane of the camera frustum and this returns you a point in world space, you can use the camera's position to get the normalized directional vector like so ray_direction := linalg.normalize(ray_hit_point - camera_pos), and then you can construct a ray from the just-constructed ray direction and the camera's position.

The following code is an example of how you would handle detecting one of the primary coordinate axis and specifying the one you're asking for:

state: Gizmo_State // stored in a heap or globally!

// ...

if is_mouse_left_button_triggered(controllers_state)
{
  // here we want to initialize the first intersection so it can be used later
  // to get the proper difference
  state.old_intersect = {}

  // hold nothing for now
  state.was_holding = {}

  // initialize the minimum values to positive infinity, the goal here is
  // to get the closest box we've hit (as a case of depth test)
  red_min, blue_min, green_min := f32(math.F32_MAX), f32(math.F32_MAX), f32(math.F32_MAX)

  if did_intersect, tmin, tmax := ray_aabb_intersect(ray, red_rect); did_intersect == true {
    red_min = tmin
  }
  if did_intersect, tmin, tmax := ray_aabb_intersect(ray, green_rect); did_intersect == true {
    green_min = tmin
  }
  if did_intersect, tmin, tmax := ray_aabb_intersect(ray, blue_rect); did_intersect == true {
    blue_min = tmin
  }
  if red_min != math.F32_MAX && (red_min < green_min && red_min < blue_min) {
    state.was_holding = .red
    state.old_intersect = ray_plane(ray, red_plane)
  }
  if green_min != math.F32_MAX && (green_min < blue_min && green_min < red_min) {
    state.was_holding = .green
    state.old_intersect = ray_plane(ray, green_plane)
  }
  if blue_min != math.F32_MAX && (blue_min < green_min && blue_min < red_min) {
    state.was_holding = .blue
    state.old_intersect = ray_plane(ray, blue_plane)
  }
}

All of this should be enough, after that you can get to the next frame and cast a point into the plane that contains the translational axis like we mentioned earlier, and then you take the difference between the previous intersected point and the new intersected point from the current frame and add the difference to the object's position and that's pretty much it:

new_intersected_point: Vec3
switch state.was_holding
{
  case .red:   new_intersected_point = ray_plane(ray, red_plane)
  case .green: new_intersected_point = ray_plane(ray, green_plane)
  case .blue:  new_intersected_point = ray_plane(ray, blue_plane)
}

// ...

switch state.was_holding
{
  case .red: object_pos += (Vec3{1, 0, 0} * linalg.dot(new_intersected_point - state.old_intersect, Vec3{1, 0, 0}))
  case .green: object_pos += (Vec3{0, 1, 0} * linalg.dot(new_intersected_point - state.old_intersect, Vec3{0, 1, 0}))
  case .blue: object_pos += (Vec3{0, 0, 1} * linalg.dot(new_intersected_point - state.old_intersect, Vec3{0, 0, 1}))
}

And here's an active demo from the editor:

Rotation

Rotation is a little more interesting but still straightforward, another thing to observe here is that rotations happen within a plane! (both in 2D and in 3D).

But first, let's suppose we want to rotate from one point to another through a circular manifold like so:

The angle here is taken by the inverse of the cosine and the dot products of the normalized vectors, if this seems nonsensical, here's a proof on why that's the case:

If A is exactly perpendicular to B, the rotation degree would be 90! and If A was exactly parallel to B, the rotation degree would be 0, you can see the pattern here. As A rotates, the angle changes as well, but there's one caveat, a rotation always takes the shortest path, that is, if you have the following:

The blue region is the rotation magnitude you would get out of the product (the angle is constrained to be within [0, 180] in degrees or [0, π] in radians), however It wouldn't matter whether you take blue or green paths, you will still get the same orientation since both get to the same point on the circular manifold!

So now to finish it all, what we've been doing so far was rotating things within a plane, which essentially means as long as you have two vectors and you rotate from one to another, they must always be contained within a plane, so that means in 3D you can have planes in any orientations and rotate both vectors within them as much as you would like!

And to prove this, let's suppose we have an arbitrarily oriented plane, and by also including the definition of the cross product (which I would later go over with a bit more detail), we can combine both the dot and the cross products into a single equation to produce the familiar Euler's Formula e^(i * theta) = cos(theta) + i * sin(theta) which involves rotations within a plane:

So yes, rotations indeed happen within a plane. Now you may be wondering what if a point is NOT within the oriented plane? well, that would explain the reasoning behind the general quaternion vector product that everyone is familiar with qvq^-1, but that's all explained in section Appendix A if you're interested (which is at the bottom of this article!).

OK, so why did we go over all of this? mainly because we want to construct a rotation from two vectors/points which also involves an angle relative to an origin, if you start from one vector and you rotate to another vector, you could construct a rotation from these two points alone which is straightforward as shown before, but that's not all because we also want to know which direction to rotate at (clockwise or counter clockwise?) and that's where the cross product part helps.

If you start to move from P to A then the rotation would be counter clockwise, and the outcome vector would be pointing upwards, and the other way around is also true, if you start to move from P to B, then the rotation would be clockwise and the outcome vector would be pointing downwards. And this is all a hint to tell you which direction you need to rotate to (and to make it more obvious, which closest path we need to take depending on our angle constraint [0, π] as mentioned before), so hopefully this is all clear.

Now to rotate things, you need two points/vectors which are relative to the rotational origin, the initial one and the second one, once you have both you need to construct the exponential (or rotator) from the dot and the cross products to turn it into the familiar Euler's Formula and that's pretty much it:

There's also another thing we can do, instead of relying on each specific axis, we can also rotate from all axis at the same time! Suppose you have a plane that's exactly tangent to a specific point on the circular manifold like so:

If you have any vector that sits within that plane, you can exponentiate it to turn it into a rotational transform operator using the Euler's Formula, and this also applies to 3D too with the oriented planes! So nothing here really differs, it's the same operation regardless:

Also believe it or not, that's all there is to rotations as well.

Now just like translations, we want to select one of the rotational coordinate axis, instead of using a box this time on each infinitely long axis lines, we instead want to select a circle that's contained within the plane, this is fairly straightforward to implement and to explain too.

A circle can be described with a position and radius, though the circle's manifold is also just an infinitely thin circular line that has no area within so it's pretty much impossible to test against, to fix this we add a threshold to the manifold to give it some area along it's circular line like so:

That's all there is to it really, here's the point circle intersection code which should explain it all:

point_circle_intersection :: proc(point: Vec3, circle_pos: Vec3, circle_radius: f32, threshold: f32) -> bool
{
  closest_point_to_circle := linalg.normalize(point - circle_pos) * circle_radius
  delta := linalg.length(point - (closest_point_to_circle + circle_pos))
  if delta <= threshold {
    return true
  }
  else {
    return false
  }
}

And testing against each rotational axis should be pretty much similar to how it's handled in the translational case.

Now to rotate we would need two points just like before in the translation case, however the points now must be relative to the object's origin to account for angles more accurately that way as mentioned before:

if (is_mouse_left_button_down(controllers_state) == true) && (is_mouse_left_button_triggered(controllers_state) == false)
{
  // the intersection points must be relative to the object's origin
  A := linalg.normalize(state.old_intersect - object_pos)
  B := linalg.normalize(new_intersection_point - object_pos)

  // in case both points are roughly the same, don't calculate the rotation since we don't
  // want to cause things to go chaos mode by infinities!
  if linalg.length2(A - B) >= -math.F32_EPSILON && linalg.length2(A - B) <= math.F32_EPSILON {
    state.old_intersect = new_intersection_point
  }
  else
  {
    axis_of_rotation := linalg.normalize(linalg.cross(A, B)) // unit axis of rotation vector

    // we are taking the math.acos(...) here since the input of the quaternion_angle_axis_f32(...) procedure
    // takes in the angle in radians
    angle := math.acos(linalg.dot(A, B))

    // orient is a quaternion128 type
    orient = orient * linalg.quaternion_angle_axis_f32(angle, axis_of_rotation)

    state.old_intersect = new_intersection_point
   }
}

And here's a live demo involving rotations:

Conclusion

Now that was all there is to the fundamentals of gizmo modifiers really, it's not as complicated as it may have seem, though this kind of implementation may not be perfect for your needs, you can always adjust it to be whatever you want, it's mainly to provide you a guidance on the most basic one you could imagine and expand it.

Thank you for reading the article!

Appendix A - The Mystery Behind `qvq^-1`

So what's with the idea of rotating things only in the plane exactly? My earlier proof (for 2D/3D at most) shows that's the case, but you may be wondering what if the point/vector is not within the plane? Well, there's going to be an additional term appearing alongside Euler's Formula which accounts for that, and we could also prove that Euler's Formula and in addition to the extra term rotating a vector would equal to the general quaternion vector product formula qvq^-1, but this would require us to think a little differently.

Suppose in the following that we have a vector V that's decomposed into parallel and perpendicular vectors relative to the rotating axis-vector N like so:

What you would notice is that the vector parallel to the rotating axis-vector do not change, whilst the perpendicular vectors do.

We could prove this using Rodrigues' rotation formula:

You can see the final equation shows that the perpendicular vector is only rotated, and rotates within the plane! However also notice in the full derivation in the last part there's an additional term there alongside Euler's Formula:

Now the following is the next proof on connections between the general quaternion vector product formula and Rodrigues' rotation formula (referenced from [1] and [3]):

So hopefully this shows that it's not really a big mystery, from Rodrigues' rotation formula it's quite intuitive how it works and it's the best way to get intuition behind the general quaternion vector product.

And here's a short demo to demonstrate the effect in action:

Vec3 :: [3]f32
Vec4 :: [4]f32

axis_rot := linalg.normalize(Vec3{1, 1, 1}) // oriented plane's unit normal
v := Vec3{1, 0, 0} // a unit vector

v_parallel := axis_rot * linalg.dot(v, axis_rot) // distance from oriented plane axis vector `axis_rot`

v_parallel_q := quaternion128(1) // represents pure directional quaternion
v_parallel_q.w = 0
v_parallel_q.x = v_parallel.x
v_parallel_q.y = v_parallel.y
v_parallel_q.z = v_parallel.z

v_perp := v - v_parallel // project the vector "v" into the oriented plane `axis_rot`

n := quaternion128(1)
n.w = 0

// 1
{
  n.x = v_perp.x
  n.y = v_perp.y
  n.z = v_perp.z
  // "* 2.0" to take `0.5` angle multiplication inside the quaternion_angle_axis(...) procedure into account
  q := linalg.quaternion_angle_axis(math.to_radians_f32(90.0) * 2.0, axis_rot)
  fmt.println((q * n) + v_parallel_q)
}
// 2
{
  n.x = v.x
  n.y = v.y
  n.z = v.z
  q := linalg.quaternion_angle_axis(math.to_radians_f32(90.0), axis_rot)
  fmt.println(q * n * conj(q))
}

//
// running the above code prints the following:
//
//           w              i            j           k
// 1 - -2.9802322e-08 +0.33333328i +0.91068363j -0.244017k
// 2 -              0 +0.33333325i +0.91068363j -0.2440169k
//
// looking through i j k axis, both represent the same vector we just rotated! (if we ignore the floating point errors as usual)

References & Further Readings

[1] https://arxiv.org/pdf/1711.02508 (Quaternion kinematics for the error-state Kalman filter - Joan Sol`a, Highly recommend reading this)
[2] https://www.ethaneade.com/ (Various Lie theory resources and notes on rotations and robotics)
[3] https://en.wikipedia.org/w/index.php?title=Euler%E2%80%93Rodrigues_formula&oldid=1278581650#Connection_with_quaternions (Rodrigues' rotation formula connection with the general quaternion vector product formula)
[4] Blender's Gizmo implementation

Memresable/gizmo.md