Abstract

Point cloud completion aims to recover missing geometric structures from incomplete 3D scans, which often suffer from occlusions or limited sensor viewpoints. Existing methods typically assume fixed input/output densities or rely on image-based representations, making them less suitable for real-world scenarios with variable sparsity and limited supervision. In this paper, we introduce Density-agnostic and Class-aware Network (DANCE), a novel framework that completes only the missing regions while preserving the observed geometry. DANCE generates candidate points via ray-based sampling from multiple viewpoints. A transformer decoder then refines their positions and predicts opacity scores, which determine the validity of each point for inclusion in the final surface. To incorporate semantic guidance, DANCE includes a classification head and fusion network trained directly on geometric features, enabling category-consistent completion without relying on external image supervision. Extensive experiments on the PCN and MVP benchmarks show that DANCE outperforms state-of-the-art methods in accuracy and structural consistency, while remaining robust to varying input densities and noise levels.

The overall pipeline of DANCE

The overall pipeline of DANCE. It generates candidate points using ray-based sampling strategy to reconstruct missing regions of incomplete point clouds. The encoder extracts features from both the incomplete input and generated points, and the decoder processes them to estimate the offset and opacity of each point. Final completion results are obtained by refining candidate points using the predicted offset and selecting valid ones based on the predicted opacity.

Details of DANCE Decoder

The decoder consists of three main components: 1) a face transformer; 2) a classification head; and 3) a fusion network. It performs point cloud completion by predicting the 3D coordinate offset and opacity value for each candidate point in $P^S$. Based on these predictions, the missing 3D point cloud $P^{\text{out}}$ is estimated. Importantly, the proposed decoder focuses solely on reconstructing the missing regions, rather than regenerating the entire object point cloud. Moreover, thanks to its transformer-based design, the decoder can handle input point clouds with varying densities. We define the offset $o_m = \{o_x, o_y, o_z\}$ to represent the $(x, y, z)$ displacements of the $m$-th candidate point $p_m$. For each candidate point $p_m$, a local coordinate system is defined with $p_m$ as the origin, where the sampling ray is aligned with the $z$-axis, and the two axes of the corresponding face define the $x$- and $y$-axes. The opacity value $\sigma$ represents the influence of each point, similar to its role in NeRF. Only points with $\sigma$ above a certain value ($\geq0.5$) are considered meaningful, while the rest are filtered out. This decoder design allows the model to adjust the output point cloud density. Then, the missing 3D point cloud $P^{\text{out}}$ is defined as follows: \begin{equation} \label{eq:p_out} P^{\text{out}} = \{ p_m + o_m \mid \sigma_m \geq 0.5,\; m = 1, \ldots, M \}. \end{equation} The final completion result is obtained by combining the predicted points $P^{\text{out}}$ with the incomplete input $P^I$, yielding $P^{\text{pred}} = P^I \cup P^{\text{out}}$.

Experiments

Comparison with State-of-the-Art Methods

We compared DANCE against state-of-the-art point cloud completion methods on the PCN and MVP benchmarks. As shown in Tab.2, our method achieves superior performance across most object categories, with an overall CD-Avg of $6.46$ on the PCN dataset. Figure 8 presents qualitative comparisons with other point cloud completion methods. Our approach achieves more accurate reconstruction of fine geometric details while maintaining global shape consistency. For objects with complex structures, such as lamps and chairs, DANCE effectively recovers thin parts and symmetric components that existing methods often fail to reconstruct.

Performance comparison with state-of-the-art methods on the PCN dataset.

Real-World Robustness of DANCE

We evaluate the robustness of DANCE in two real-world scenarios: (1) noisy inputs from sensor artifacts, and (2) varying input and output point densities. Real-world point clouds often contain noise due to sensor limitations or environmental interference. To evaluate the robustness of DANCE to such noise, we added Gaussian noise at varying levels to PCN dataset. Figure 10 shows the change in CD-Avg as the noise level increases. DANCE outperforms state-of-the-art methods SVDFormer and SeedFormer, and exhibits greater robustness to various noise levels. Previous methods typically assume fixed input and output densities, limiting their flexibility in handling varying point cloud sparsity, which does not reflect real-world scenarios. In contrast, DANCE achieves density-agnostic completion by leveraging a transformer-based decoder that flexibly operates regardless of the input point cloud density, and adjusts the output density through its opacity prediction process. Figure 11 shows that DANCE allows flexible control over the output density. This is achieved by adjusting the face sampling parameter $R$ and filtering out low-opacity candidate points. Although DANCE is trained with a fixed setting of $R=21$, the output density can be modified at inference time by changing $R$ to values like 17 or 29, without requiring retraining. This density-agnostic capability makes DANCE more adaptable to real-world scenarios.

BibTeX

@article{kim2025dance,
  title={DANCE: Density-agnostic and Class-aware Network for Point Cloud Completion},
  author={Kim, Da-Yeong and Cho, Yeong-Jun},
  journal={arXiv preprint arXiv:2511.07978},
  year={2025}
}

🕺DANCE: Density-agnostic and Class-aware Network for Point Cloud Completion