My Journal and some practical toy example
Introduction
My previous blog post Gaussian Splatting - Camera Poses talks about all the pertain basic concepts, including Gaussian Splatting, Splatting, Unity, Blender, C-arm Machine and Neural Radiance Field.
As a biomedical engineer working in the orthopedic domain, one very useful imaging tool we use heavily is the intra-operative imaging system called C-arm for 2D or O-arm for 3D scan of patient anatomical regions (spine). The most flexible feature is its arm that can rotate along the curvature, providing different viewing angles for various organs. In particular, for the spine, when the receiver (cylinder) is located at the bottom and the source (rectangle block) is located at the top, this pose is good for Anterior-Posterior view. Conversely, when the receiver and the source are located at the side, this pose is good for Medial-Lateral view.
The system is portable, flexible, fast, and safe, the x-ray dose is quite low) can be use in the operating theater. Extremely useful for navigation during minimally invasive surgeries and checking after implant or guide tools insertion.
Note: There is a lead metal shield in place so that when C-arm is in use, the shield prevents any leakage of X-ray.
Rationale
Neural Radiance Field is a breakthrough. It was first published in 2020, causing a revolution in the computer vision community. The link to the paper NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
The term neural radiance field has three keywords: neural, radiance, and field. Neural stands for neural network. Radiance stands for ray tracing technique for image rendering. Field is a mathematical terms , presenting a model.
Together, neural radiance field uses a neural network (MLP) to “memorize” the color, opacity of a scene in three-dimensional space (field). Once the network has memorized or has been trained, the information is encoded by the network implicitly. We can then extract it at every location by querying the network at each location. To train the network, neural radiance field technique uses only a stack of images (RGB) with camera poses.
Ray tracing is also included because the training needs feedback on its accuracy and this accuracy is determined by comparing the similarity between the images rendered by the network and actual ground-truth images, often obtained at the start.
Ray tracing technique is a method to keep track of the light intensity attenuation (like a physics model) as it passes through the model in space. The path or trajectory it takes before hitting an image screen determines its final color and intensity we ultimately see. To obtain an RGB image, we place a camera towards the scene center (known camera pose or camera extrinsic), follow these camera rays (one for each pixel on the screen), query the network for information about color and density at each sample point on the ray’s trajectory, and then sum up all the attention to estimate the final pixel value (RGB) on screen.
The really amazing idea is that ray tracing and its summation can be designed in a way that fits into the training framework of a neural network. Put simply, these operations are auto-differentiable and numerically stable.
While the neural radiance field results look impressive (Peruse the original publication and subsequent work), the training and inference process is very slow. Much time is spent on querying the MLP for spatial information and moving along the ray trajectories.
Instead of ray tracing, what about splatting?
Instead of ray tracing, Gaussian splatting uses splatting to render an image, a process that is much much faster once the model has been trained.
Splatting is fast because the objects in the scene are computed to work out their relative positions during training. During rendering, only draw objects that are visible (for example only draw a red ball that is in front of a green ball because the green ball is hidden behind the red and is invisible).
Of course, our clever readers would point out that we cannot model reflection or other lighting effects. It turns out, we don’t need to include all lighting effects and trace all light rays to render reasonably photorealistic images.
Following up next is Gaussian Splatting - Camera Poses