Yanis Yanis

Depthmap-based 2D–3D Reconstruction of the Hand Bones from a Single-View Radiograph for the Diagnosis of Deformity and Treatment Planning変形診断と治療計画のための単一X線画像からの深度マップを用いた手骨の2D-3D再構成

Focus研究焦点

Depthmap-based 2D–3D Reconstruction of the Hand Bones from a Single-View Radiograph for the Diagnosis of Deformity and Treatment Planning変形診断と治療計画のための単一X線画像からの深度マップを用いた手骨の2D-3D再構成

This project focuses on the reconstruction of a metrically accurate three-dimensional representation of hand bones from a single X-ray image, which is a fundamentally ill-posed problem due to the loss of depth information inherent to 2D radiographic projection. In a standard X-ray, multiple 3D structures are compressed along the projection rays into a single image, making it impossible to directly recover the original geometry without additional assumptions or learned priors. Nevertheless, solving this problem has strong clinical relevance, particularly for deformity assessment and surgical planning, where access to 3D information is crucial but CT acquisition may be costly, time-consuming, or associated with higher radiation exposure.

The approach proposed in this project is inspired by and builds upon methods such as 3DDX, which aim to recover 3D anatomical structures from radiographic data using learning-based strategies. In contrast to approaches that directly predict a full 3D volume or mesh, this project relies on an intermediate and more structured representation: depth maps. More specifically, the model is trained to predict two depth maps from a single X-ray image, corresponding to the front and back surfaces of the bone along each projection ray. This representation is particularly well-suited to X-ray imaging geometry, as it explicitly models the entry and exit points of the bone along each ray, effectively encoding local thickness and spatial extent.

A key challenge in this framework is the absence of ground-truth depth information in real X-ray images. To address this, the project introduces a data generation pipeline based on CT volumes. High-resolution CT scans with accurate bone segmentations are used as the ground truth 3D reference. These CT volumes are first aligned to the X-ray coordinate system through a registration process, ensuring geometric consistency between modalities. Once aligned, digitally reconstructed radiographs are generated by simulating the X-ray acquisition process. From this aligned volumetric data, precise front and back depth maps are computed for each pixel by intersecting projection rays with the segmented bone surfaces.

The overall learning framework is composed of two main stages. First, a segmentation network extracts the bone regions from the input X-ray. This step constrains subsequent predictions to anatomically relevant areas and limits the influence of surrounding soft tissues. Then, a second neural network predicts the front and back depth maps from the X-ray, optionally guided by the segmentation output. Once the depth maps are predicted, reconstruction of the 3D geometry is performed using the known X-ray imaging parameters, back-projecting front and back depth values into 3D space for all pixels to produce a dense bone volume.

In terms of results, the method demonstrates that the predicted depth maps closely approximate the ground-truth depth derived from CT data, both qualitatively and quantitatively. The reconstructed 3D shapes exhibit realistic anatomical structures and capture important geometric features of the bones. Compared to reference approaches such as 3DDX, this method emphasizes a physically interpretable intermediate representation and a reconstruction process grounded in imaging geometry, which can improve robustness and interpretability. Challenges remain regarding the ill-posed nature of the problem, CT-to-X-ray registration accuracy, and low-contrast overlap regions in the radiograph.

本研究は、単一X線画像から手骨の計量的に正確な三次元表現を再構成することを目的としています。これは2D放射線投影に固有の深度情報の損失により根本的に不良設定な問題です。標準的なX線では複数の三次元構造が投影光線に沿って単一画像に圧縮されるため、追加の仮定や学習済み事前情報なしに元の形状を直接回復することは不可能です。しかし、この問題の解決は臨床的に強い関連性を持ちます。特に変形評価と外科的計画において三次元情報へのアクセスが重要ですが、CT撮影はコストや時間・より高い被ばく線量の問題があります。

提案手法は3DDXなどの学習ベース戦略でX線データから三次元解剖学的構造を回復する手法から着想・発展させたものです。三次元体積やメッシュを直接予測するアプローチとは異なり、本研究は中間的でより構造化された表現である深度マップに依存します。具体的には、単一X線画像から各投影光線に沿った骨の前面と後面に対応する二つの深度マップを予測するようにモデルを学習します。この表現は各光線に沿った骨の入射・出射点を明示的にモデル化し局所的な厚みと空間的広がりをエンコードするため、X線撮像幾何学に特に適しています。

このフレームワークの主要な課題は実X線画像における正解深度情報の不在です。これに対処するため、CT体積に基づくデータ生成パイプラインを導入します。正確な骨セグメンテーションを持つ高解像度CTスキャンを三次元の正解参照として使用し、X線座標系への位置合わせ後にDRRを生成します。整合された体積データから、セグメント化された骨表面との交差により各ピクセルの正確な前後深度マップが算出されます。

学習フレームワークは二つの主要段階で構成されます。まずセグメンテーションネットワークが入力X線から骨領域を抽出し、続いて第二のニューラルネットワークがセグメンテーション出力を補助情報として前後深度マップを予測します。深度マップ予測後、既知のX線撮像パラメータを使って三次元形状再構成を実行し、全ピクセルの前後深度値を三次元空間に逆投影して密な骨体積を生成します。

結果として、予測された深度マップはCTデータから得られた正解深度に定性的・定量的の両面で近似します。再構成された三次元形状は現実的な解剖学的構造を示し骨の重要な幾何学的特徴を捉えています。3DDXなどの参照手法と比較して、物理的に解釈可能な中間表現と撮像幾何学に基づく再構成プロセスを重視することで、ロバスト性と解釈可能性の向上が期待できます。問題の不良設定性・CT-X線レジストレーション精度・X線画像の低コントラスト重複領域は依然として課題として残ります。

Project 11
A Unified Framework for 2D-3D and 3D-3D Image Registration
2D-3Dおよび3D-3D画像レジストレーションのための統一フレームワーク
Image Registration
2D-3D Registration
3D-3D Registration
CMA-ES Optimization
GPU Rendering

Depthmap-based 2D–3D Reconstruction of the Hand Bones from a Single-View Radiograph for the Diagnosis of Deformity and Treatment Planning