视频教程:【AI生成场景新突破】3D Gaussian Splatting入门指南_哔哩哔哩_bilibili

以下内容来自:3D Gaussian Splatting入门指南 – 哔哩哔哩 (bilibili.com),以下为部分内容,请点击链接查看全文。

AI生成场景新突破:3D Gaussian Splatting的简介及训练入门教程

3D Gaussian Splatting是一种用一组2d图像创建3d场景的方法,你只需要一个场景的视频或者一组照片就可以获得这个场景的高质量3d表示,使你可以从任何角度渲染它。它们是一类辐射场方法(如NeRF),但同时训练速度更快(同等质量)、渲染速度更快,并达到更好或相似的质量。3D Gaussian Splatting可以实现无界且完整的场景1080p分辨率下进行高质量实时(≥ 100 fps)视图合成。

https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

该论文获得的siggraph 2023最佳论文

项目github地址:https://github.com/graphdeco-inria/gaussian-splatting

3D Gaussian Splatting简单原理

在传统的摄影测量流程中,可以将一组2d图像转换成点云。

而3D Gaussian Splatting更进一步,将点云变成3d空间中的椭球体,每个椭球体都拥有位置\大小和选择都经过优化颜色和不透明度。当混合在一起时,可以产生从任何角度渲染的完整模型的可视化效果。

正如你所见,3D Gaussian Splatting很好的保持了毛绒玩具的绒毛特性,这是摄影测量和之前的nerf都很难做到的。

本文参考了:https://www.reshot.ai/3d-gaussian-splatting  (内容请看下面分割线部分) 作者:分形噪波 https://www.bilibili.com/read/cv26465887/ 出处:bilibili


以下内容来自:3D Gaussian Splatting: A beginner friendly introduction and tutorial on how to train them (reshot.ai) https://www.reshot.ai/3d-gaussian-splatting

3D Gaussian Splatting

A beginner friendly introduction to 3D Gaussian Splats and tutorial on how to train them.

3D Gaussian Splatting is a new method for novel-view synthesis of scenes captured with a set of photos or videos. They are a class of Radiance Field methods (like NeRFs) but are simultaneously faster to train (at equal quality), faster to render, and reach better or similar quality. They are also easier to understand and to postprocess (more on that later). This is a beginner friendly introduction to 3D Gaussian Splats and how to train them.

What are 3D Gaussian Splats?

At a high level, 3D Gaussian splats, like NeRFs or photogrammetry methods, are a way to create a 3D scene using a set of 2D images. Practically, this means that all you need is a video or a set of photos of a scene, to obtain a 3D representation of it — enabling you to reshoot it, or render it from any angle.

Here’s an example of a capture I made. As input, I used 750 images from a plush toy, that I recorded with my phone from different angles.

Once trained, the model is a pointcloud of 3D Gaussians. Here is the pointcloud visualized as simple points.

But what are 3D Gaussians? They are a generalization of 1D Gaussians (the bell curve) to 3D. Essentially they are ellipsoids in 3D space, with a center, a scale, a rotation, and “softened edges”.

Each 3D Gaussian is optimized along with a (viewdependant) color and opacity. When blended together, here’s the visualization of the full model, rendered from ANY angle. As you can see, 3D Gaussian Splatting captures extremely well the fuzzy and soft nature of the plush toy, something that photogrammetry-based methods struggle to do.

How to train your own models? (Tutorial)

Important: before starting, check the requirements (about your OS & GPU) to train 3D Gaussian Splats here. In particular, this will require a CUDA-ready GPU with 24 GB of VRAM.

Step 1: Record the scene

If you want to use the same model as me for testing (the plush toy), I have made all images, intermediate files and outputs available, so you can skip to step 2.

Recording the scene is one of the most important steps because that’s what the model will be trained on. You can either record a video (and extract the frames afterwards) or take individual photos. Be sure to move around the scene, and to capture it from different angles. Generally, the more images you have, the better the model will be. A few tips to keep in mind to get the best results:

  • Avoid moving too fast, as it can cause blurry frames (which 3D Gaussian Splats will try to reproduce)
  • Try to aim for 200-1000 images. Less than 200 images will result in a low quality model, and more than 1000 images will take a long time to process in step 2.
  • Lock the exposure of your camera. If it’s not consistent between frames, it will cause flickering in the final model.

Just for reference, I have recorded the plush toy using a turntable, and fixed camera. You can find cheap ones on Amazon, like here. But you can also record the scene just by moving around it.

Once you’re done. Place your images in a folder called input, like this:

📦 $FOLDER_PATH
┣ 📂 input
┃ ┣ 📜 000000.jpg
┃ ┣ 📜 000001.jpg
┃ ┣ 📜 ...

Step 2: Obtain Camera poses

Obtaining camera poses is probably to most finicky step of the entire process, for inexperienced users. The goal is to obtain the position and orientation of the camera for each frame. This is called the camera pose. There are several ways to do so:

  • Use COLMAP. COLMAP is a free and open-source Structure-from-Motion (SfM) software. It will take your images as input, and output the camera poses. It comes with a GUI and is available on Windows, Mac, and Linux.
  • Use desktop softwares. These include RealityCaptureMetashape (commercial softwares).
  • Use mobile apps, including PolycamRecord3D. They take advantage of the LiDAR sensor on recent iPhones to obtain the camera poses. Unfortunately, only available on iOS with an iPhone 12 or newer.

Again, if you want to use the same model for testing, download the sample “sparse.zip” and skip to step 3.

Because it is free and open-source, we will show how to use COLMAP to obtain the camera poses.

First, install COLMAP: follow the instructions of the official installation guide.

From now on, we suggest two ways to obtain the camera poses: with an automated script, or manually with the GUI.

Download the code from the official repo. Make sure to clone it recursively to get the submodules, like this:

git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive

Then run the following script:

python convert.py -s $FOLDER_PATH

This will automatically run COLMAP and extract the camera poses for you. Be patient as this can take a few minutes to a few hours depending on the number of images. The camera poses will be saved in a folder sparse and undistored images in a folder images.

To visualize the camera poses, you can open the COLMAP GUI. On linux, you can run colmap gui in a terminal. On Windows and Mac, you can open the COLMAPapplication.

Then select File > Import model and choose the path to the folder $FOLDER_PATH/sparse/0.

The folder structure of your model dataset should now look like this:

📦 $FOLDER_PATH
┣ 📂 (input)
┣ 📂 (distorted)
┣ 📂 images
┣ 📂 sparse
┃ ┣ 📂 0
┃ ┃ ┣ 📜 points3D.bin
┃ ┃ ┣ 📜 images.bin
┃ ┃ ┗ 📜 cameras.bin

Step 3: Train the 3D Gaussian Splatting model

If you want to visualize my model, simply download the sample “output.zip” and skip to step 4.

If not already done, download the code from the official repo. Make sure to clone it recursively to get the submodules, like this:

git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive

Installation is extremely easy as the codebase has almost no dependencies. Just follow the instructions in the README. If you already have a Python environment with PyTorch, you can simply run:

pip install plyfile tqdm
pip install submodules/diff-gaussian-rasterization
pip install submodules/simple-knn

Once installed, you can train the model by running:

python train.py -s $FOLDER_PATH -m $FOLDER_PATH/output

Since my scene has white background, I’m adding the -w option. This will tell the training script that the base background color should be white (instead of black by default).

python train.py -s $FOLDER_PATH -m $FOLDER_PATH/output -w

This will save the model in the $FOLDER_PATH/output folder.

The entire training (30,000 steps) will take about 30-40 minutes, but an intermediate model will be saved after 7,000 steps which is already great. You can visualize that model right away, by following step 4.

Step 4: Visualize the model

The folder structure of your model dataset should now look like this:

📦 $FOLDER_PATH
┣ 📂 images
┣ 📂 sparse
┣ 📂 output
┃ ┣ 📜 cameras.json
┃ ┣ 📜 cfg_args
┃ ┗ 📜 input.ply
┃ ┣ 📂 point_cloud
┃ ┃ ┣ 📂 iteration_7000
┃ ┃ ┃ ┗ 📜 point_cloud.ply
┃ ┃ ┣ 📂 iteration_30000
┃ ┃ ┃ ┗ 📜 point_cloud.ply
  • If you’re on Windows, download the pre-build binaries for the visualizer here.
  • On Ubuntu 22.04, you can build the visualizer yourself by running:
    # Dependencies
    sudo apt install -y libglew-dev libassimp-dev libboost-all-dev libgtk-3-dev libopencv-dev libglfw3-dev libavdevice-dev libavcodec-dev libeigen3-dev libxxf86vm-dev libembree-dev
    # Project setup
    cd SIBR_viewers
    cmake -Bbuild . -DCMAKE_BUILD_TYPE=Release # add -G Ninja to build faster
    cmake --build build -j24 --target install

Once installed, find the SIBR_gaussianViewer_app binary and run it with the path to the model as argument:

SIBR_gaussianViewer_app -m $FOLDER_PATH/output

You get a beautiful visualizer of your trained model! Make sure to select Trackball mode for a better interactive experience.

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注