联邦学习
第一章 联邦学习介绍 联邦学习允许在不共享原始数据(简而言之,不需要将训练数据集中存放在一个中心位置,由各参与方自行保留,不与其他实体共享)的情况下跨分散设备训练机器学习模型,这对于隐私敏感型应用非常有价值。
机器学习的应用离不开高质量的训练数据,但出于隐私,数据不一定能传输到中央数据库进行整理管理。联邦学习基于此,在不同的地方对训练数据进行机器学习模型训练。
联邦学习的未来趋势将侧重于==提高效率、增强模型在不同环境下的性能以及应对安全挑战==。
首先,通信效率(首要任务): 跨数千台设备训练模型需要在客户端和中央服务器之间频繁更新,这可能很慢且资源密集。 解决方法: 模型压缩(例如,将权重量化为更少的位)、选择性参数更新(仅发送超出阈值的更改)和异步训练协议等技术将减少带宽使用。 例如,稀疏更新(仅传输模型参数的子集)可以将通信成本降低 50% 或更多。 其次,处理数据异构性至关重要。 联邦网络中的设备通常具有非相同的数据分布)。 元学习方法(例如,训练一个可以快速微调到各个设备的基础模型)是一种解决方案。 另一种是考虑数据结构变化的多任务学习框架。 第三,安全性和稳健性将有所进步。 联邦系统容易受到模型中毒(恶意客户端更改全局模型)或推理攻击(从模型更新中提取私有数据) 等攻击。
Problem-Parameter-Free Federated Learning
Federated learning (FL) has garnered significant attention from academia and industry in recent years due to its advantages in data privacy, scalability, and communication efficiency. However, current FL algorithms face a critical limitation: their performance heavily depends on meticulously tuned hyperparameters(精心调整的超参数), particularly the learning rate or stepsize. This manual tuning process is challenging in federated settings due to data heterogeneity and limited accessibility of local datasets. Consequently, the reliance on problem-specific parameters hinders the widespread adoption of FL and potentially compromises its performance in dynamic or diverse environments. To address this issue, we introduce PAdaMFed, a novel algorithm for nonconvex(非凸) FL that carefully combines adaptive stepsize and momentum techniques. PAdaMFed offers two key advantages: 1) it operates autonomously without relying on problem-specific parameters; and 2) it manages data heterogeneity and partial participation without requiring heterogeneity bounds. Despite these benefits, PAdaMFed provides several strong theoretical guarantees: 1) It achieves state-of-the-art convergence rates with a sample complexity of \(\mathcal{O}(\epsilon^{-4})\) and communication complexity of \(\mathcal{O}(\epsilon^{-3})\) to obtain an accuracy of \(||\nabla f\left(\boldsymbol{\theta}\right)|| \leq \epsilon\), even using constant learning rates; 2) these complexities can be improved to the best-known \(\mathcal{O}(\epsilon^{-3})\) for sampling and \(\mathcal{O}(\epsilon^{-2})\) for communication when incorporating variance reduction; 3) it exhibits linear speedup with respect to the number of local update steps and participating clients at each global round. These attributes make PAdaMFed highly scalable and adaptable for various real-world FL applications. Extensive empirical evidence on both image classification and sentiment analysis tasks validates the efficacy of our approaches.