Weighted Mutual Learning with Diversity-Driven Model Compression.

Miao Zhang, Li Wang, David Campos, Wei Huang, Chenjuan Guo, Bin Yang

Research output: Contribution to conference without publisher/journalPaper without publisher/journalResearchpeer-review

Abstract

Online distillation attracts attention from the community as it simplifies the traditional two-stage knowledge distillation process into a single stage. Online distillation collaboratively trains a group of peer models, which are treated as students, and all students gain extra knowledge from each other. However, memory consumption and diversity among students are two key challenges to the scalability and quality of online distillation. To address the two challenges, this paper presents a framework called Weighted Mutual Learning with Diversity-Driven Model Compression (WML) for online distillation. First, at the base of a hierarchical structure where students share different parts, we leverage the structured network pruning to generate diversified students with different models sizes, thus also helping reduce the memory requirements. Second, rather than taking the average of students, this paper, for the first time, leverages a bi-level formulation to estimate the relative importance of students with a close-form, to further boost the effectiveness of the distillation from each other. Extensive experiments show the generalization of the proposed framework, which outperforms existing online distillation methods on a variety of deep neural networks. More interesting, as a byproduct, WML produces
a series of students with different model sizes in a single run, which also achieves
competitive results compared with existing channel pruning methods.
Original languageEnglish
Publication date2022
Publication statusPublished - 2022

Bibliographical note

DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

Fingerprint

Dive into the research topics of 'Weighted Mutual Learning with Diversity-Driven Model Compression.'. Together they form a unique fingerprint.

Cite this