Homepage - Chengyu Dong

Chengyu Dong

The Bear

I am a PhD student in Computer Science at UCSD.

I am interested in the principles (and theories, if possible) of learning efficiency, especially data efficiency, as well as principle-guided efficient learning algorithms in realistic applications. I believe learning efficiency defines intelligence.

Fun facts about me:

My previous research focus was Physics and Astronomy, in particular Solar System Dynamics, that means the motion of any objects outside Earth and within our solar system (Yes, you are right! The three-body problem!)
Football (or soccer?) is my life. I play and watch games every week, rain or shine. DM me if you want to watch or play together! Football forever ⚽!
I love drama. I read (and hopefully also write in the near future) drama scripts. I also acted in stage plays. My favourite playwright is Harold Pinter.

justindongchengyu(at)gmail.com Google Scholar Twitter LinkedIn

Education

University of California, San Diego

Computer Science and Engineering
Ph.D. Student

Sep. 2021 - present

Experience

Nvidia

Research Intern

June. 2024 - present
Google DeepMind

Research Intern

June. 2023 - Nov. 2023
Microsoft Research

Research Intern

June. 2022 - June. 2023

Selected Publications (view all )

SoTeacher: Toward Student-Oriented Teacher Network Training For Knowledge Distillation

Chengyu Dong, Liyuan Liu, Jingbo Shang

International Conference on Learning Representations (ICLR) 2024

How to train an ideal teacher for knowledge distillation? We call attention to the discrepancy between the current teacher training practice and an ideal teacher training objective dedicated to student learning, and study the theoretical and practical feasibility of student-oriented teacher training.

[Paper] [Poster]

SoTeacher: Toward Student-Oriented Teacher Network Training For Knowledge Distillation

Chengyu Dong, Liyuan Liu, Jingbo Shang

International Conference on Learning Representations (ICLR) 2024

[Paper] [Poster]

Fast-ELECTRA for Efficient Pre-training

Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu

International Conference on Learning Representations (ICLR) 2024

[Paper] [Poster] [Slides]

Fast-ELECTRA for Efficient Pre-training

Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu

International Conference on Learning Representations (ICLR) 2024

[Paper] [Poster] [Slides]

Bridging Discrete and Backpropagation: Straight-Through and Beyond

Liyuan Liu, Chengyu Dong, Xiaodong Liu, Bin Yu, Jianfeng Gao

Neural Information Processing Systems (NeurIPS) 2023 Oral

[Paper]

Bridging Discrete and Backpropagation: Straight-Through and Beyond

Liyuan Liu, Chengyu Dong, Xiaodong Liu, Bin Yu, Jianfeng Gao

Neural Information Processing Systems (NeurIPS) 2023 Oral

[Paper]

Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification

Chengyu Dong, Zihan Wang, Jingbo Shang

Empirical Methods in Natural Language Processing (EMNLP) 2023

We show that simply masking the seed words can achieve state-of-the-art performance on seed-based weakly-supervised text classification

[Paper] [Code] [Slides]

Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification

Chengyu Dong, Zihan Wang, Jingbo Shang

Empirical Methods in Natural Language Processing (EMNLP) 2023

We show that simply masking the seed words can achieve state-of-the-art performance on seed-based weakly-supervised text classification

[Paper] [Code] [Slides]

Understand and Modularize Generator Optimization in ELECTRA-style Pretraining

Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu

International Conference on Machine Learning (ICML) 2023

We found that improper control of the generator optimization in ELECTRA-style pretraining is the main cause of the generator "overfitting" phenomenon, and proposed a simple method to fix it.

[Paper] [Poster] [Slides]

Understand and Modularize Generator Optimization in ELECTRA-style Pretraining

Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu

International Conference on Machine Learning (ICML) 2023

We found that improper control of the generator optimization in ELECTRA-style pretraining is the main cause of the generator "overfitting" phenomenon, and proposed a simple method to fix it.

[Paper] [Poster] [Slides]

Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting

Chengyu Dong, Liyuan Liu, Jingbo Shang

Neural Information Processing Systems (NeurIPS) 2022 Oral

We show that robust overfitting in adversarially robust deep learning is likely the result of the implicit label noise in adversarial training. Robust overfitting is thus the early stage of an epoch-wise double descent and is not a new phenomenon.

[Paper] [Poster] [Slides]

Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting

Chengyu Dong, Liyuan Liu, Jingbo Shang

Neural Information Processing Systems (NeurIPS) 2022 Oral

[Paper] [Poster] [Slides]

Towards Adaptive Residual Network Training: A Neural-ODE Perspective

Chengyu Dong, Liyuan Liu, Zichao Li, Jingbo Shang

International Conference on Machine Learning (ICML) 2020

[Paper] [Code] [Slides]

Towards Adaptive Residual Network Training: A Neural-ODE Perspective

Chengyu Dong, Liyuan Liu, Zichao Li, Jingbo Shang

International Conference on Machine Learning (ICML) 2020

[Paper] [Code] [Slides]

Warning

Action required

Education

Experience

Selected Publications (view all )

SoTeacher: Toward Student-Oriented Teacher Network Training For Knowledge Distillation

SoTeacher: Toward Student-Oriented Teacher Network Training For Knowledge Distillation

Fast-ELECTRA for Efficient Pre-training

Fast-ELECTRA for Efficient Pre-training

Bridging Discrete and Backpropagation: Straight-Through and Beyond

Bridging Discrete and Backpropagation: Straight-Through and Beyond

Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification

Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification

Understand and Modularize Generator Optimization in ELECTRA-style Pretraining

Understand and Modularize Generator Optimization in ELECTRA-style Pretraining

Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting

Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting

Towards Adaptive Residual Network Training: A Neural-ODE Perspective

Towards Adaptive Residual Network Training: A Neural-ODE Perspective

All publications