TY - CONF
T1 - Weighted Mutual Learning with Diversity-Driven Model Compression.
AU - Zhang, Miao
AU - Wang, Li
AU - Campos, David
AU - Huang, Wei
AU - Guo, Chenjuan
AU - Yang, Bin
N1 - DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.
PY - 2022
Y1 - 2022
N2 - Online distillation attracts attention from the community as it simplifies the traditional two-stage knowledge distillation process into a single stage. Online distillation collaboratively trains a group of peer models, which are treated as students, and all students gain extra knowledge from each other. However, memory consumption and diversity among students are two key challenges to the scalability and quality of online distillation. To address the two challenges, this paper presents a framework called Weighted Mutual Learning with Diversity-Driven Model Compression (WML) for online distillation. First, at the base of a hierarchical structure where students share different parts, we leverage the structured network pruning to generate diversified students with different models sizes, thus also helping reduce the memory requirements. Second, rather than taking the average of students, this paper, for the first time, leverages a bi-level formulation to estimate the relative importance of students with a close-form, to further boost the effectiveness of the distillation from each other. Extensive experiments show the generalization of the proposed framework, which outperforms existing online distillation methods on a variety of deep neural networks. More interesting, as a byproduct, WML producesa series of students with different model sizes in a single run, which also achievescompetitive results compared with existing channel pruning methods.
AB - Online distillation attracts attention from the community as it simplifies the traditional two-stage knowledge distillation process into a single stage. Online distillation collaboratively trains a group of peer models, which are treated as students, and all students gain extra knowledge from each other. However, memory consumption and diversity among students are two key challenges to the scalability and quality of online distillation. To address the two challenges, this paper presents a framework called Weighted Mutual Learning with Diversity-Driven Model Compression (WML) for online distillation. First, at the base of a hierarchical structure where students share different parts, we leverage the structured network pruning to generate diversified students with different models sizes, thus also helping reduce the memory requirements. Second, rather than taking the average of students, this paper, for the first time, leverages a bi-level formulation to estimate the relative importance of students with a close-form, to further boost the effectiveness of the distillation from each other. Extensive experiments show the generalization of the proposed framework, which outperforms existing online distillation methods on a variety of deep neural networks. More interesting, as a byproduct, WML producesa series of students with different model sizes in a single run, which also achievescompetitive results compared with existing channel pruning methods.
M3 - Paper without publisher/journal
ER -