Papers

CVPR 2022

Investigating Trade-offs in Real-World Video Super-Resolution
K. C. K. Chan, S. Zhou, X. Xu, C. C. Loy
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2022 (CVPR)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We examine the contributions of temporal propagation in real-world VSR and find that long-term information is also beneficial to this task but do not come for free, due to the diverse and complicated degradations in the wild. As an explorational study, we reveal several challenges in real-world VSR. We find that the domain gap on degradations and the increased computational costs result in various challenges and tradeoffs.

BasicVSR++: Effective Video Super-Resolution via Enhanced Propagation and Alignment
K. C. K. Chan, S. Zhou, X. Xu, C. C. Loy
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2022 (CVPR)
[PDF] [arXiv] [Supplementary Material] [Project Page]

BasicVSR++ won three champions in NTIRE 2021 Video Restoration and Enhancement Challenge. BasicVSR++ consists of two effective modifications for improving propagation and alignment. The proposed second-order grid propagation and flow-guided deformable alignment allow BasicVSR++ to significantly outperform existing state of the arts with comparable runtime.

Unsupervised Image-to-Image Translation with Generative Prior
S. Yang, L. Jiang, Z. Liu, C. C. Loy
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2022 (CVPR)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We explore the use of GAN generative prior to build a versatile unsupervised image-to-image translation framework. In particular, we present a two-stage framework that is able to characterize content correspondences at a high semantic level for challenging multi-modal translations between distant domains. Such content correspondences can be discovered with only domain supervision.

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
S. Yang, L. Jiang, Z. Liu, C. C. Loy
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2022 (CVPR)
[PDF] [arXiv] [Supplementary Material] [Project Page] [YouTube]

We extend StyleGAN to accept style condition from new domains while preserving its style control in the original domain. This results in an interesting application of high-resolution exemplar-based portrait style transfer with a friendly data requirement. DualStyleGAN, with an additional style path to StyleGAN, can effectively model and modulate the intrinsic and extrinsic styles for flexible and diverse artistic portrait generation.

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing
Y. Xu, Y. Yin, L. Jiang, Q. Wu, C. Zheng, C. C. Loy, B. Dai, W. Wu
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2022 (CVPR)
[arXiv] [Supplementary Material] [Project Page]

We present TransEditor, a dual-space GAN architecture with a Cross-Space Interaction mechanism based on the Transformer. The approach employs a new dual-space image editing and inversion strategy for highly controllable facial editing.

Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory
S-Y. Li, W. Yu, T. Gu, C. Lin, Q. Wang, C. Qian, C. C. Loy, Z. Liu
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2022 (CVPR, Oral)
[PDF] [arXiv] [Supplementary Material] [Project Page] [YouTube]

We address the spatial and temporal challenges of 3D dance generation by proposing a novel framework named Bailando, which is composed of a choreographic memory to address the spatial constraint by encoding and quantizing dancing-style poses, and an actor-critic GPT to realize the temporal coherency with music that translates and aligns various motion tempos and music beats.

Video K-Net: A Simple, Strong, and Unified Baseline For End-to-End Dense Video Segmentation
X. Li, W. Zhang, J. Pang, K. Chen, G. Cheng, Y. Tong, C. C. Loy
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2022 (CVPR, Oral)
[PDF] [arXiv] [Supplementary Material] [Project Page]

Video K-Net is a simple, strong and unified framework for fully end-to-end video panoptic segmentation. It achieves state-of-the-art resuls on popular benchmarks including Cityscapes-VPS and KITTI-STEP.

Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation
Y. Hou, X. Zhu, Y. Ma, C. C. Loy, Y. Li
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2022 (CVPR)
[PDF] [Supplementary Material] [Project Page]

We propose a novel point-to-voxel knowledge distillation approach (PVD) tailored for LiDAR semantic segmentation. PVD is comprised of the point-to-voxel output distillation and affinity distillation. The supervoxel partition and difficulty-aware sampling strategy are further proposed to improve the learning efficiency of affinity distillation.

Conditional Prompt Learning for Vision-Language Models
K. Zhou, J. Yang, C. C. Loy, Z. Liu
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2022 (CVPR)
[PDF] [arXiv] [Project Page]

Conditional Context Optimization (CoCoOp) extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). Compared to CoOp’s static prompts, the dynamic prompts adapt to each instance and are thus less sensitive to class shift.

Quick

Paper Links