视觉与机器人

3D 视觉

三维重建、NeRF、Gaussian Splatting、点云和空间智能。

今日/当前日期收录 2 篇信号源：cs.CV, cs.GR, cs.RO

2605.17131 2026-06-18 cs.CV cs.AI cs.LG 版本更新 95%

A Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation

针对点云分类和分割的深度学习架构系统性调研

Minhas Kamal, Hiranya Garbha Kumar, Balakrishnan Prabhakaran

发表机构 * State University of New York at Albany（纽约州立大学阿尔巴尼分校）

专题命中点云：系统性调研点云分类和分割的深度学习架构。

AI总结本文系统性地探讨了点云分类和分割中的深度学习架构，分析了点云数据的结构特性，分类了不同架构的工作，并评估了其在主流基准上的性能，同时指出了开放挑战和未来方向。

Comments We reviewed a decade of advancements in point cloud processing: trace the evolution of the field from its foundational roots to the modern SOTA, analyze how diverse architectures overcome the inherent geometric challenges of 3D data, and map out critical research gaps alongside promising future directions. GitHub: https://github.com/MinhasKamal/DeepLearningForPointCloud

Journal ref ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2026

详情

DOI: 10.1145/3815180

AI中文摘要

点云因其简洁性和几何保真度而成为表示3D形状和场景最广泛采用的格式。然而，其固有的无序和不规则性质，加剧了传感器噪声和遮挡的影响，给基于机器学习的方法带来了独特的挑战。为应对这些问题，已开发出多种策略，包括转换为有序格式、提取局部几何特征以及基于排列不变或自注意力的处理方法。在本文中，我们的重点是深度学习模型在3D视觉三个基本任务中的应用：点云分类、部分分割和语义分割。我们首先正式定义点云数据，然后深入讨论其结构特性。接着，我们根据其骨干结构对重要工作进行分类，并评估其在流行基准上的性能。除了经验比较外，我们还提供了架构创新和局限性的见解。我们还概述了3D点云理解中的开放挑战和有前途的未来方向。

英文摘要

Point cloud stands as the most widely adopted format for representing 3D shapes and scenes due to its simplicity and geometric fidelity. However, its inherent unordered and irregular nature, exacerbated by sensor noise and occlusions, introduces unique challenges for machine learning based methodologies. To combat these issues, diverse strategies have been developed, including converting to a format that has orderliness, extracting local geometry, and permutation-invariant or self-attention-based processing. In this paper, our focus is directed towards deep learning models for three fundamental tasks in 3D vision: point cloud classification, part segmentation, and semantic segmentation. We begin by formally defining point cloud data, followed by an in-depth discussion on its structural characteristics. Then, we categorize notable works based on their backbone structure and evaluate their performance on popular benchmarks. Beyond empirical comparison, we offer insights into architectural innovations and limitations. We also outline open challenges and promising future directions for 3D point cloud understanding.

URL PDF HTML ☆

赞 0 踩 0

2601.01200 2026-06-18 cs.CV eess.IV 版本更新 85%

Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity

点云的多尺度隐式结构相似性客观质量评估

Zhang Chen, Shuai Wan, Yuezhe Zhang, Siyu Ren, Fuzheng Yang, Junhui Hou

发表机构 * School of Electronics and Information, Northwestern Polytechnical University（电子与信息学院，西北工业大学）； Department of Computer Science, City University of Hong Kong（计算机科学系，香港城市大学）； School of Telecommunication Engineering, Xidian University（电信工程学院，西安电子科技大学）

专题命中点云：点云质量评估，多尺度隐式结构相似性

AI总结针对点云质量评估中不规则数据匹配困难的问题，提出多尺度隐式结构相似性度量（MS-ISSM），通过径向基函数连续表示局部特征并比较隐式函数系数，结合ResGrouped-MLP网络，在多个基准上超越现有方法。

Comments IEEE TMM Accepted

详情

AI中文摘要

点云的无结构和不规则特性对精确的点云质量评估（PCQA）构成重大挑战，特别是在建立准确的感知特征对应关系方面。为了解决这一问题，我们提出了多尺度隐式结构相似性度量（MS-ISSM）。与传统的点对点匹配不同，MS-ISSM利用径向基函数（RBF）连续表示局部特征，将失真测量转化为隐式函数系数的比较。该方法有效避免了不规则数据中固有的匹配误差。此外，我们提出了ResGrouped-MLP质量评估网络，该网络能够鲁棒地将多尺度特征差异映射到感知分数。该网络架构摒弃了传统的平面多层感知器（MLP），采用分组编码策略，集成了残差块和通道注意力机制。这种分层设计使得模型能够保留亮度、色度和几何的独特物理语义，同时自适应地关注高、中、低尺度上最显著的失真特征。在多个基准上的实验结果表明，MS-ISSM在可靠性和泛化性方面均优于最先进的指标。源代码可在以下网址获取：this https URL。

英文摘要

The unstructured and irregular nature of points poses a significant challenge for accurate point cloud quality assessment (PCQA), particularly in establishing accurate perceptual feature correspondence. To tackle this, we propose the Multi-scale Implicit Structural Similarity Measurement (MS-ISSM). Unlike traditional point-to-point matching, MS-ISSM utilizes radial basis function (RBF) to represent local features continuously, transforming distortion measurement into a comparison of implicit function coefficients. This approach effectively circumvents matching errors inherent in irregular data. Additionally, we propose a ResGrouped-MLP quality assessment network, which robustly maps multi-scale feature differences to perceptual scores. The network architecture departs from traditional flat multi-layer perceptron (MLP) by adopting a grouped encoding strategy integrated with residual blocks and channel-wise attention mechanisms. This hierarchical design allows the model to preserve the distinct physical semantics of luma, chroma, and geometry while adaptively focusing on the most salient distortion features across High, Medium, and Low scales. Experimental results on multiple benchmarks demonstrate that MS-ISSM outperforms state-of-the-art metrics in both reliability and generalization. The source code is available at: https://github.com/ZhangChen2022/MS-ISSM.

URL PDF HTML ☆

赞 0 踩 0