Skip to main content Accessibility help
×
Hostname: page-component-586b7cd67f-rcrh6 Total loading time: 0 Render date: 2024-11-27T01:51:05.029Z Has data issue: false hasContentIssue false

Bibliography

Published online by Cambridge University Press:  14 January 2022

Song Guo
Affiliation:
The Hong Kong Polytechnic University
Zhihao Qu
Affiliation:
The Hong Kong Polytechnic University
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Edge Learning for Distributed Big Data Analytics
Theory, Algorithms, and System Design
, pp. 190 - 214
Publisher: Cambridge University Press
Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

(2017). Baidu ring-allreduce: Bringing hpc techniques to deep learning. https://github.com/baidu-research/baidu-allreduce.Google Scholar
(2020). Keras: the python deep learning api. https://keras.io/.Google Scholar
(2020). Mpi forum: Message passing interface (mpi) forum home page. https://www.mpi-forum.org/.Google Scholar
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P. A., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X. (2016). Tensorflow: A system for large-scale machine learning. In Proc. OSDI.Google Scholar
Ablin, P., Moreau, T., Massias, M., and Gramfort, A. (2019). Learning step sizes for unfolded sparse coding. (NeurIPS).Google Scholar
Addanki, R., Venkatakrishnan, S. B., Gupta, S., Mao, H., and Alizadeh, M. (2019). Placeto: Learning generalizable device placement algorithms for distributed machine learning. CoRR.Google Scholar
Agarwal, N., Bullins, B., Chen, X., Hazan, E., Singh, K., Zhang, C., and Zhang, Y. (2019). Efficient full-matrix adaptive regularization. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:139–147.Google Scholar
Agarwal, N. and Singh, K. (2017). The price of differential privacy for online learning. In Proceedings of the 34th International Conference on Machine Learning - Volume 70.Google Scholar
Aji, A. F. and Heafield, K. (2017). Sparse communication for distributed gradient descent. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pages 440– 445.Google Scholar
Al-Fares, M., Loukissas, A., and Vahdat, A. (2008). A scalable, commodity data center network architecture. In Proc. SIGCOMM.CrossRefGoogle Scholar
Alistarh, D., Allen-Zhu, Z., and Li, J. (2018a). Byzantine stochastic gradient descent. In Advances in Neural Information Processing Systems 31.Google Scholar
Alistarh, D., Grubic, D., Li, J., Tomioka, R., and Vojnovic, M. (2017a). QSGD: communication-efficient SGD via gradient quantization and encoding. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R., editors, Proceedings of Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar
Alistarh, D., Grubic, D., Li, J., Tomioka, R., and Vojnovic, M. (2017b). QSGD: Communication-efficient SGD via gradient quantization and encoding. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pages 1709–1720.Google Scholar
Alistarh, D., Grubic, D., Li, J., Tomioka, R., and Vojnovic, M. (2017c). QSGD: communication-efficient SGD via gradient quantization and encoding. In Proceedings of Conference on Neural Information Processing Systems (NeurIPS).Google Scholar
Alistarh, D., Hoefler, T., Johansson, M., Konstantinov, N., Khirirat, S., and Renggli, C. (2018b). The convergence of sparsified gradient methods. In Proceedings of Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar
Alistarh, D., Hoefler, T., Johansson, M., Konstantinov, N., Khirirat, S., and Renggli, C. (2018c). The convergence of sparsified gradient methods. In Proceedings of Neural Information Processing Systems (NeurIPS).Google Scholar
Amiri, M. M. and Gunduz, D. (2019). Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air.CrossRefGoogle Scholar
Amiri, M. M., Gunduz, D., Kulkarni, S. R., and Poor, H. V. (2020a). Convergence of update aware device scheduling for federated learning at the wireless edge.Google Scholar
Amiri, M. M., Gunduz, D., Kulkarni, S. R., and Poor, H. V. (2020b). Update aware device scheduling for federated learning at the wireless edge. arXiv preprint arXiv:2001.10402.Google Scholar
Aydöre, S., Thirion, B., and Varoquaux, G. (2019). Feature grouping as a stochastic regularizer for high-dimensional structured data. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:592–603.Google Scholar
Banner, R., Hubara, I., Hoffer, E., and Soudry, D. (2018). Scalable methods for 8-bit training of neural networks. Advances in Neural Information Processing Systems, 2018-December(NeurIPS):5145–5153.Google Scholar
Basu, D. D. (2019). Qsparse-local-SGD: Communication Efficient Distributed SGD with Quantization, Sparsification, and Local Computations. PhD thesis, University of California, Los Angeles, USA.CrossRefGoogle Scholar
Bernacchia, A., Lengyel, M., and Hennequin, G. (2018). Exact natural gradient in deep linear networks and application to the nonlinear case. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS):5941–5950.Google Scholar
Bhagoji, A. N., Chakraborty, S., Mittal, P., and Calo, S. B. (2018). Analyzing federated learning through an adversarial lens. CoRR, abs/1811.12470.Google Scholar
Ying, Bicheng, Yuan, Kun and Sayed, A. H. (2017). Variance-Reduced Stochastic Learning under Random Reshufflin. In Advances in Neural Information Processing Systems, volume 2017-Decem, pages 1624–1634.Google Scholar
Blanchard, P., El Mhamdi, E. M., Guerraoui, R., and Stainer, J. (2017). Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems 30.Google Scholar
Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Ingerman, A., Ivanov, V., Kiddon, C., Konecný, J., Mazzocchi, S., McMahan, H. B., Overveldt, T. V., Petrou, D., Ramage, D., and Roselander, J. (2019). Towards federated learning at scale: System design. ArXiv, abs/1902.01046.Google Scholar
Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., and Seth, K. (2017). Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.Google Scholar
Brendel, W., Rauber, J., and Bethge, M. (2017). Decision-based adversarial attacks: Reliable attacks against black-box machine learning models.Google Scholar
Brutzkus, A., Elisha, O., and Gilad-Bachrach, R. (2018). Low latency privacy preserving inference. CoRR, abs/1812.10659.Google Scholar
Buckman, J., Roy, A., Raffel, C., and Goodfellow, I. (2018). Thermometer encoding: One hot way to resist adversarial examples. In International Conference on Learning Representations.Google Scholar
Canel, C., Kim, T., Zhou, G., Li, C., Lim, H., Andersen, D. G., Kaminsky, M., and Dulloor, S. R. (2019). Scaling video analytics on constrained edge nodes.Google Scholar
Charles, Z., Papailiopoulos, D., and Ellenberg, J. (2017). Approximate gradient coding via sparse random graphs. arXiv.Google Scholar
Chen, C., Choi, J., Brand, D., Agrawal, A., Zhang, W., and Gopalakrishnan, K. (2018a). Adacomp : Adaptive residual gradient compression for data-parallel distributed training. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI.CrossRefGoogle Scholar
Chen, D., Leon, A. S., Engle, S. P., Fuentes, C., and Chen, Q. (2017a). Offline training for improving online performance of a genetic algorithm based optimization model for hourly multi-reservoir operation. Environ. Model. Softw., 96:4657.Google Scholar
Chen, H., Wu, H. C., Chan, S. C., and Lam, W. H. (2019a). A stochastic quasi-newton method for large-scale nonconvex optimization with applications. IEEE transactions on neural networks and learning systems.Google Scholar
Chen, J., Monga, R., Bengio, S., and Józefowicz, R. (2016). Revisiting distributed synchronous SGD. arXiv, abs/1604.00981.Google Scholar
Chen, M., Yang, Z., Saad, W., Yin, C., Poor, H. V., and Cui, S. (2019b). A joint learning and communications framework for federated learning over wireless networks.Google Scholar
Chen, T., Giannakis, G., Sun, T., and Yin, W. (2018b). LAG: Lazily aggregated gradient for communication-efficient distributed learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pages 5050–5060.Google Scholar
Chen, X., Liu, S., Xu, K., Li, X., Lin, X., Hong, M., and Cox, D. (2019c). ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization. (NeurIPS): 1–12.Google Scholar
Chen, Y., Su, L., and Xu, J. (2017). Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proceedings of the ACM on Measurement and Analysis of Computing Systems (SIGMETRICS), 1(2), 125.Google Scholar
Chen, Y.-K., Wu, A.-Y., Bayoumi, M. A., and Koushanfar, F. (2013). Editorial low-power, intelligent, and secure solutions for realization of internet of things. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 3(1):14.Google Scholar
Chin, T.-W., Ding, R., and Marculescu, D. (2019). Adascale: Towards real-time video object detection using adaptive scaling. arXiv preprint arXiv:1902.02910.Google Scholar
Chou, Y.-M., Chan, Y.-M., Lee, J.-H., Chiu, C.-Y., and Chen, C.-S. (2018). Unifying and merging well-trained deep neural networks for inference stage. arXiv preprint arXiv:1805.04980.Google Scholar
Cipar, J., Ho, Q., Kim, J. K., Lee, S., Ganger, G. R., Gibson, G., Keeton, K., and Xing, E. (2013). Solving the straggler problem with bounded staleness. In Presented as part of the 14th Workshop on Hot Topics in Operating Systems.Google Scholar
Cortes, C. and Vapnik, V. (2004). Support-vector networks. Machine Learning, 20:273297.Google Scholar
Daga, H., Nicholson, P. K., Gavrilovska, A., and Lugones, D. (2019). Cartel: A system for collaborative transfer learning at the edge. Proceedings of the ACM Symposium on Cloud Computing.CrossRefGoogle Scholar
Datta, S., Bhaduri, K., Giannella, C., Wolff, R., and Kargupta, H. (2006). Distributed data mining in peer-to-peer networks. IEEE Internet Computing, 10(4):1826.CrossRefGoogle Scholar
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., et al. (2012). Large scale distributed deep networks. In Advances in neural information processing systems, pages 1223–1231.Google Scholar
Defazio, A., Bach, F. R., and Lacoste-Julien, S. (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Proceedings of Annual Conference on Neural Information Processing Systems.Google Scholar
Dekel, O., Gilad-Bachrach, R., Shamir, O., and Xiao, L. (2012). Optimal distributed online prediction using mini-batches. J. Mach. Learn. Res., 13:165202.Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248255. Ieee.Google Scholar
Dennis, D. K., Pabbaraju, C., Simhadri, H. V., and Jain, P. (2018). Multiple instance learning for efficient sequential data classification on resource-constrained devices. Advances in Neural Information Processing Systems, 2018-December(NeurIPS):10953– 10964.Google Scholar
Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems, pages 1269–1277.Google Scholar
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Google Scholar
Dhillon, G. S., Azizzadenesheli, K., Lipton, Z. C., Bernstein, J., Kossaifi, J., Khanna, A., and Anandkumar, A. (2018). Stochastic activation pruning for robust adversarial defense. CoRR, abs/1803.01442.Google Scholar
Dieuleveut, A. and Patel, K. K. (2019). Communication trade-offs for local-sgd with large step size. In Proceedings of Advances in Neural Information Processing Systems, NeurIPS.Google Scholar
Dong, X., Liu, L., Li, G., Li, J., Zhao, P., Wang, X., and Feng, X. (2019). Exploiting the input sparsity to accelerate deep neural networks: poster. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, pages 401–402.Google Scholar
Dozat, T. (2016). Incorporating Nesterov Momentum into Adam. in Proceedings of ICLR Workshop, (1):2013–2016.Google Scholar
Du, S. S. and Hu, W. (2019). Width provably matters in optimization for deep linear neural networks. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:2956–2970.Google Scholar
Du, Y. and Huang, K. (2018). Fast analog transmission for high-mobility wireless data acquisition in edge learning. Arxiv Online Available: https://arxiv.org/abs/1807.11250.Google Scholar
Egan, K. J., Pinto-Bruno, Á. C., Bighelli, I., Berg-Weger, M., van Straten, A., Albanese, E., and Pot, A. (2018). Online training and support programs designed to improve mental health and reduce burden among caregivers of people with dementia: A systematic review. Journal of the American Medical Directors Association, 19 3:200206.e1.Google Scholar
Elgabli, A., Park, J., Issaid, C. B., and Bennis, M. (2020). Harnessing wireless channels for scalable and privacy-preserving federated learning. ArXiv, abs/2007.01790.Google Scholar
Epasto, A., Esfandiari, H., and Mirrokni, V. (2019). On-device algorithms for public-private data with absolute privacy. In The World Wide Web Conference.Google Scholar
Fang, B., Zeng, X., and Zhang, M. (2018). Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. Proceedings of the 24th Annual International Conference on Mobile Computing and Networking.Google Scholar
Faraji, I., Mirsadeghi, S. H., and Afsahi, A. (2016). Topology-aware GPU selection on multi-gpu nodes. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops.Google Scholar
Gao, Y., Chen, L., and Li, B. (2018a). Spotlight: Optimizing device placement for training deep neural networks. In International Conference on Machine Learning.Google Scholar
Gao, Y., Chen, L., and Li, B. (2018b). Spotlight: Optimizing device placement for training deep neural networks. In Proceedings of the 35th International Conference on Machine Learning, ICML.Google Scholar
Gaunt, A., Johnson, M., Riechert, M., Tarlow, D., Tomioka, R., Vytiniotis, D., and Webster, S. (2017). Ampnet: Asynchronous model-parallel training for dynamic neural networks. arXiv, abs/1705.09786.Google Scholar
Gazagnadou, N., Gower, R. M., and Salmon, J. (2019). Optimal mini-batch and step sizes for SAGA. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:3734–3742.Google Scholar
Ge, J., Wang, Z., Wang, M., and Liu, H. (2018). Minimax-optimal privacy-preserving sparse pca in distributed systems. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics.Google Scholar
Geng, Y., Yang, Y., and Cao, G. (2018). Energy-efficient computation offloading for multicore-based mobile devices. IEEE INFOCOM 2018 - IEEE Conference on Computer Communications, pages 46–54.Google Scholar
Ghadimi, S. and Lan, G. (2013). Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):23412368.Google Scholar
Ghadimi, S. and Lan, G. (2016). Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming, 156:5999.Google Scholar
Giacomelli, I., Jha, S., Joye, M., Page, C. D., and Yoon, K. (2018). Privacy-preserving ridge regression with only linearly-homomorphic encryption. In Applied Cryptography and Network Security.Google Scholar
Gibbons, R. (1992). Primer in Game Theory. Harvester Wheatsheaf.Google Scholar
Gope, D., Dasika, G., and Mattina, M. (2019). Ternary hybrid neural-tree networks for highly constrained iot applications.Google Scholar
Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. (2017). Accurate, large minibatch sgd: Training imagenet in 1 hour.Google Scholar
Gu, J., Chowdhury, M., Shin, K. G., Zhu, Y., Jeon, M., Qian, J., Liu, H., and Guo, C. (2019a). Tiresias: A {GPU} cluster manager for distributed deep learning. In 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19), pages 485–500.Google Scholar
Gu, L., Zeng, D., Guo, S., Barnawi, A., and Xiang, Y. (2017). Cost efficient resource management in fog computing supported medical cyber-physical system. IEEE Transactions on Emerging Topics in Computing, 5(1):108119.Google Scholar
Gu, R., Yang, S., and Wu, F. (2019b). Distributed machine learning on mobile devices: A survey. arXiv preprint arXiv:1909.08329.Google Scholar
Gunasekar, S., Lee, J. D., Soudry, D., and Srebro, N. (2018). Characterizing implicit bias in terms of optimization geometry. In Proceedings of the 35th International Conference on Machine Learning, ICML.Google Scholar
Gündüz, D., de Kerret, P., Sidiropoulos, N. D., Gesbert, D., Murthy, C. R., and van der Schaar, M. (2019). Machine learning in the air. IEEE J. Sel. Areas Commun., 37(10):21842199.CrossRefGoogle Scholar
Guo, C., Lu, G., Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Zhang, Y., and Lu, S. (2009). Bcube: A high performance, server-centric network architecture for modular data centers. In Proc. SIGCOMM.Google Scholar
Guo, P., Hu, B., Li, R., and Hu, W. (2018a). Foggycache: Cross-device approximate computation reuse. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pages 1934. ACM.Google Scholar
Guo, Y., Yao, A., and Chen, Y. (2016). Dynamic network surgery for efficient dnns. In Annual Conference on Neural Information Processing Systems, NeurIPS, December 5-10, 2016, Barcelona, Spain, pages 1379–1387.Google Scholar
Guo, Y., Zhang, C., Zhang, C., and Chen, Y. (2018b). Sparse dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems 31.Google Scholar
Gupta, H., Srikant, R., and Ying, L. (2019). Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning. (NeurIPS).Google Scholar
Haddadpour, F., Kamani, M. M., Mahdavi, M., and Cadambe, V. R. (2019a). Local SGD with periodic averaging: Tighter analysis and adaptive synchronization. In Proceedings of Advances in Neural Information Processing Systems, NeurIPS.Google Scholar
Haddadpour, F., Kamani, M. M., Mahdavi, M., and Cadambe, V. R. (2019b). Trading redundancy for communication: Speeding up distributed SGD for non-convex optimization. In Proceedings of the 36th International Conference on Machine Learning, ICML.Google Scholar
Hadjis, S., Zhang, C., Mitliagkas, I., Iter, D., and , C. (2016). Omnivore: An optimizer for multi-device deep learning on cpus and gpus. arXiv.Google Scholar
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., and Xu, C. (2019). Ghostnet: More features from cheap operations. arXiv preprint arXiv:1911.11907.Google Scholar
Han, P., Wang, S., and Leung, K. K. (2020). Adaptive gradient sparsification for efficient federated learning: An online learning approach. CoRR, abs/2001.04756.Google Scholar
Han, S., Mao, H., and Dally, W. J. (2015a). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.Google Scholar
Han, S., Mao, H., and Dally, W. J. (2016). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. arXiv.Google Scholar
Han, S., Pool, J., Tran, J., and Dally, W. (2015b). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135–1143.Google Scholar
Han, S., Pool, J., Tran, J., and Dally, W. J. (2015c). Learning both weights and connections for efficient neural network. In Annual Conference on Neural Information Processing Systems, NeurIPS, December 7-12, 2015, Montreal, Quebec, Canada, pages 1135–1143.Google Scholar
Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N. R., Ganger, G. R., and Gibbons, P. B. (2018). Pipedream: Fast and efficient pipeline parallel DNN training. arXiv, abs/1806.03377.Google Scholar
He, F., Liu, T., and Tao, D. (2019). Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence. NeurIPS, (NeurIPS):10.Google Scholar
He, J., Chen, Y., Fu, T. Z. J., Long, X., Winslett, M., You, L., and Zhang, Z. (2018a). Haas: Cloud-based real-time data analytics with heterogeneity-aware scheduling. In 38th IEEE International Conference on Distributed Computing Systems, ICDCS.Google Scholar
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proc. CVPR, pages 770–778.Google Scholar
He, W., Li, B., and Song, D. (2018b). Decision boundary analysis of adversarial examples. In International Conference on Learning Representations.Google Scholar
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., and Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4):1828.Google Scholar
Heikkilä, M., Lagerspetz, E., Kaski, S., Shimizu, K., Tarkoma, S., and Honkela, A. (2017). Differentially private bayesian learning on distributed data. In Advances in Neural Information Processing Systems 30.Google Scholar
Hesamifard, E., Takabi, H., and Ghasemi, M. (2017). Cryptodl: Deep neural networks over encrypted data. CoRR, abs/1711.05189.Google Scholar
Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.Google Scholar
Hitaj, B., Ateniese, G., and Perez-Cruz, F. (2017). Deep models under the gan: Information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.Google Scholar
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):17351780.Google Scholar
Holmes, C., Mawhirter, D., He, Y., Yan, F., and Wu, B. (2019). Grnn: Low-latency and scalable rnn inference on gpus. Proceedings of the 14th EuroSys Conference 2019.Google Scholar
Hosmer, D. W. and Lemeshow, S. (1989). Applied logistic regression.Google Scholar
Hsieh, K., Harlap, A., Vijaykumar, N., Konomis, D., Ganger, G. R., Gibbons, P. B., and Mutlu, O. (2017a). Gaia: Geo-distributed machine learning approaching LAN speeds. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI.Google Scholar
Hsieh, K., Harlap, A., Vijaykumar, N., Konomis, D., Ganger, G. R., Gibbons, P. B., and Mutlu, O. (2017b). Gaia: Geo-distributed machine learning approaching {LAN} speeds. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17), pages 629–647.Google Scholar
Hsieh, K., Harlap, A., Vijaykumar, N., Konomis, D., Ganger, G. R., Gibbons, P. B., and Mutlu, O. (2017c). Gaia: Geo-distributed machine learning approaching lan speeds. In Proc. NSDI.Google Scholar
Hu, J., Shen, L., and Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141.Google Scholar
Huang, C., Zhai, S., Talbott, W., Bautista, M. A., Sun, S. Y., Guestrin, C., and Susskind, J. (2019a). Addressing the loss-metric mismatch with adaptive loss alignment. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:5145–5154.Google Scholar
Huang, H., Wang, C., and Dong, B. (2019b). Nostalgic ADAM: Weighting more of the past gradients when designing the adaptive learning rate. in Proceedings of IJCAI International Joint Conference on Artificial Intelligence, 2019-Augus:2556–2562.Google Scholar
Huang, J., Qian, F., Guo, Y., Zhou, Y., Xu, Q., Mao, Z. M., Sen, S., and Spatscheck, O. (2013). An in-depth study of lte: effect of network protocol and application behavior on performance. ACM SIGCOMM Computer Communication Review, 43(4):363374.Google Scholar
Huang, L., Yin, Y., Fu, Z., Zhang, S., Deng, H., and Liu, D. (2018). Loadaboost: Loss-based adaboost federated machine learning on medical data. arXiv preprint arXiv:1811.12629.Google Scholar
Huang, T., Ye, B., Qu, Z., Tang, B., Xie, L., and Lu, S. (2020). Physical-layer arithmetic for federated learning in uplink mu-mimo enabled wireless networks. In Proceedings of IEEE Conference on Computer Communications, INFOCOM.Google Scholar
Huang, Y., Cheng, Y., Chen, D., Lee, H., Ngiam, J., Le, Q. V., and Chen, Z. (2019c). Gpipe: Efficient training of giant neural networks using pipeline parallelism. In Proc. NeurIPS.Google Scholar
Hui, L., Li, X., Gong, C., Fang, M., Zhou, J. T., and Yang, J. (2019). Inter-Class Angular Loss for Convolutional Neural Networks. in Proceedings of the AAAI Conference on Artificial Intelligence, 33:38943901.Google Scholar
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., and Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360.Google Scholar
Jackson, M. O. (2014). Mechanism theory. Available at SSRN 2542983.Google Scholar
Jaggi, M., Smith, V., Takác, M., Terhorst, J., Krishnan, S., Hofmann, T., and Jordan, M. I. (2014). Communication-efficient distributed dual coordinate ascent. In Proc. NIPS.Google Scholar
Jain, P., Thakkar, O., and Thakurta, A. (2017). Differentially private matrix completion, revisited. CoRR, abs/1712.09765.Google Scholar
Jayaraman, B., Wang, L., Evans, D., and Gu, Q. (2018). Distributed learning without distress: Privacy-preserving empirical risk minimization. In Advances in Neural Information Processing Systems 31.Google Scholar
Jeon, Y.-S., Amiri, M. M., Li, J., and Poor, H. V. (2020). Gradient estimation for federated learning over massive mimo communication systems.Google Scholar
Jeong, E., Oh, S., Kim, H., Park, J., Bennis, M., and Kim, S.-L. (2018a). Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479.Google Scholar
Jeong, H.-J., Lee, H.-J., Shin, C. H., and Moon, S.-M. (2018b). Ionn: Incremental offloading of neural network computations from mobile devices to edge servers. In Proceedings of the ACM Symposium on Cloud Computing, pages 401–411.Google Scholar
Jia, Q., Guo, L., Jin, Z., and Fang, Y. (2018). Preserving model privacy for machine learning in distributed systems. IEEE Transactions on Parallel and Distributed Systems, 29(8):18081822.Google Scholar
Jiang, J., Cui, B., Zhang, C., and Yu, L. (2017a). Heterogeneity-aware distributed parameter servers. Proceedings of the ACM SIGMOD International Conference on Management of Data, Part F127746:463–478.Google Scholar
Jiang, J., Cui, B., Zhang, C., and Yu, L. (2017b). Heterogeneity-aware distributed parameter servers. In Proc. SIGMOD.Google Scholar
Jiang, R. and Zhou, S. (2020). Cluster-based cooperative digital over-the-air aggregation for wireless federated edge learning. ArXiv, abs/2008.00994.Google Scholar
Jin, S., Di, S., Liang, X., Tian, J., Tao, D., and Cappello, F. (2019). Deepsz: A novel framework to compress deep neural networks by using error-bounded lossy compression. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, pages 159–170.Google Scholar
Johnson, R. and Zhang, T. (2013). Accelerating stochastic gradient descent using predictive variance reduction. In Proceedings of 27th Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar
Jouppi, N. P., Young, C., Patil, N., Patterson, D. A., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., Boyle, R., Cantin, P., Chao, C., Clark, C., Coriell, J., Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami, T. V., Gottipati, R., Gulland, W., Hagmann, R., Ho, C. R., Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey, A., Jaworski, A., Kaplan, A., Khaitan, H., Killebrew, D., Koch, A., Kumar, N., Lacy, S., Laudon, J., Law, J., Le, D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean, G., Maggiore, A., Mahony, M., Miller, K., Nagarajan, R., Narayanaswami, R., Ni, R., Nix, K., Norrie, T., Omernick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M., Salek, A., Samadiani, E., Severn, C., Sizikov, G., Snelham, M., Souter, J., Steinberg, D., Swing, A., Tan, M., Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan, V., Walter, R., Wang, W., Wilcox, E., and Yoon, D. H. (2017). In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA, Toronto, ON, Canada, June 24-28, 2017, pages 1–12.Google Scholar
Kalchbrenner, N., Danihelka, I., and Graves, A. (2015). Grid long short-term memory. arXiv.Google Scholar
Karimireddy, S. P., Rebjock, Q., Stich, S. U., and Jaggi, M. (2019). Error feedback fixes signsgd and other gradient compression schemes. arXiv preprint arXiv:1901.09847.Google Scholar
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Proc. NeurIPS.Google Scholar
Kim, Y., Kim, J., Chae, D., Kim, D., and Kim, J. (2019). μlayer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In Proceedings of the Fourteenth EuroSys Conference 2019, pages 1–15.Google Scholar
Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar
Koda, Y., Yamamoto, K., Nishio, T., and Morikura, M. (2020). Differentially private aircomp federated learning with power adaptation harnessing receiver noise. ArXiv, abs/2004.06337.Google Scholar
Koloskova, A., Lin, T., Stich, S. U., and Jaggi, M. (2019a). Decentralized deep learning with arbitrary communication compression. arXiv preprint arXiv:1907.09356.Google Scholar
Koloskova, A., Stich, S. U., and Jaggi, M. (2019b). Decentralized stochastic optimization and gossip algorithms with compressed communication. arXiv preprint arXiv:1902.00340.Google Scholar
Konecný, J., McMahan, H. B., and Ramage, D. (2015). Federated optimization: Distributed optimization beyond the datacenter. ArXiv, abs/1511.03575.Google Scholar
Koutnik, J., Greff, K., Gomez, F., and Schmidhuber, J. (2014). A clockwork rnn. arXiv.Google Scholar
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.Google Scholar
Kumar, A., Fu, J., Tucker, G., and Levine, S. (2019). Stabilizing off-policy q-learning via bootstrapping error reduction. In NeurIPS.Google Scholar
Kusupati, A., Singh, M., Bhatia, K., Kumar, A., Jain, P., and Varma, M. (2018). Fastgrnn: A fast, accurate, stable and tiny kilobyte sized gated recurrent neural network. In Advances in Neural Information Processing Systems, pages 9017–9028.Google Scholar
Lathauwer, L. D., Moor, B. D., and Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM J. Matrix Analysis Applications, 21:12531278.Google Scholar
LeCun, Y. (1998). The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/.Google Scholar
Lee, K., Lam, M., Pedarsani, R., Papailiopoulos, D., and Ramchandran, K. (2018). Speeding up distributed machine learning using codes. IEEE Transactions on Information Theory, 64(3):15141529.Google Scholar
Lee, S., Kim, J. K., Zheng, X., Ho, Q., Gibson, G. A., and Xing, E. P. (Canada, 2014). On model parallelization and scheduling strategies for distributed machine learning. In Proc. NeurIPS.Google Scholar
Lei, L., Tan, Y., Liu, S., Zheng, K., et al. (2019). Deep reinforcement learning for autonomous internet of things: Model, applications and challenges. arXiv preprint arXiv:1907.09059.Google Scholar
Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B. (2014a). Scaling distributed machine learning with the parameter server. In Proceedings of 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI.Google Scholar
Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B. Y. (2014b). Scaling distributed machine learning with the parameter server. in Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2014, pages 583–598.Google Scholar
Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B.-Y. (2014c). Scaling distributed machine learning with the parameter server. pages 583–598.Google Scholar
Li, M., Zhang, T., Chen, Y., and Smola, A. J. (New York, 2014d). Efficient mini-batch training for stochastic optimization. In Proc. SIGKDD.Google Scholar
Li, P. and Guo, S. (2015). Incentive mechanisms for device-to-device communications. IEEE Network, 29(4):7579.Google Scholar
Li, P., Wu, X., Shen, W., Tong, W., and Guo, S. (2019a). Collaboration of heterogeneous unmanned vehicles for smart cities. IEEE Network, 33(4):133137.Google Scholar
Li, S., Kalan, S. M. M., Yu, Q., Soltanolkotabi, M., and Avestimehr, A. S. (2018a). Polynomially coded regression: Optimal straggler mitigation via data encoding. arXiv.Google Scholar
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. (2018b). Federated optimization in heterogeneous networks. arXiv preprint arXiv:1812.06127.Google Scholar
Li, T., Sanjabi, M., Beirami, A., and Smith, V. (2019b). Fair resource allocation in federated learning.Google Scholar
Li, X. and Orabona, F. (2018). On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes. 89.Google Scholar
Li, X., Wang, W., Hu, X., and Yang, J. (2019c). Selective kernel networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 510–519.Google Scholar
Li, Y., Ma, T., and Zhang, H. (2018c). Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In Bubeck, S., Perchet, V., and Rigollet, P., editors, Proceedings of Conference On Learning Theory, COLT.Google Scholar
Li, Y., Wei, C., and Ma, T. (2019d). Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks. (NeurIPS):1–12.Google Scholar
Li, Z., Brendel, W., Walker, E. Y., Cobos, E., Muhammad, T., Reimer, J., Bethge, M., Sinz, F. H., Pitkow, X., and Tolias, A. S. (2019e). Learning From Brains How to Regularize Machines. (NeurIPS):1–11.Google Scholar
Li, Z., Xu, C., and Leng, B. (2018d). Rethinking Loss Design for Large-scale 3D Shape Retrieval. pages 840–846.Google Scholar
Li, Z., Xu, C., and Leng, B. (2019f). Angular Triplet-Center Loss for Multi-View 3D Shape Retrieval. in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI.Google Scholar
Lian, X., Zhang, C., Zhang, H., Hsieh, C.-J., Zhang, W., and Liu, J. (2017). Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 5330–5340.Google Scholar
Lian, X., Zhang, W., Zhang, C., and Liu, J. (2018). Asynchronous decentralized parallel stochastic gradient descent. In Dy, J. G. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, ICML.Google Scholar
Ligett, K., Neel, S., Roth, A., Waggoner, B., and Wu, S. Z. (2017). Accuracy first: Selecting a differential privacy level for accuracy constrained erm. In Advances in Neural Information Processing Systems 30.Google Scholar
Lim, W. Y. B., Luong, N. C., Hoang, D. T., Jiao, Y., Liang, Y.-C., Yang, Q., Niyato, D., and Miao, C. (2019). Federated learning in mobile edge networks: A comprehensive survey. arXiv preprint arXiv:1909.11875.Google Scholar
Lin, T., Stich, S. U., Patel, K. K., and Jaggi, M. (2020). Don’t use large mini-batches, use local SGD. In Proceedings of 8th International Conference on Learning Representations, ICLR.Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision, pages 740755. Springer.Google Scholar
Lin, Y., Han, S., Mao, H., Wang, Y., and Dally, B. (2018). Deep gradient compression: Reducing the communication bandwidth for distributed training. In Proceedings of 6th International Conference on Learning Representations, ICLR.Google Scholar
Liu, F. and Shroff, N. B. (2019). Data poisoning attacks on stochastic bandits. In Proceedings of the 36th International Conference on Machine Learning.Google Scholar
Liu, W., Zang, X., Li, Y., and Vucetic, B. (2020). Over-the-air computation systems: Optimization, analysis and scaling laws. IEEE Transactions on Wireless Communications, 19(8):54885502.Google Scholar
Liu, Y., Shang, F., and Jiao, L. (2019). Accelerated incremental gradient descent using momentum acceleration with scaling factor. In Proceedings of IJCAI International Joint Conference on Artificial Intelligence, 2019-Augus:3045–3051.Google Scholar
Liu, Y., Xu, C., Zhan, Y., Liu, Z., Guan, J., and Zhang, H. (2017). Incentive mechanism for computation offloading using edge computing: A stackelberg game approach. Computer Networks, 129:399409.Google Scholar
Loshchilov, I. and Hutter, F. (2017). FIXING WEIGHT DECAY REGULARIZATION IN ADAM.Google Scholar
Louizos, C., Reisser, M., Blankevoort, T., Gavves, E., and Welling, M. (2019). Relaxed quantization for discretized neural networks. In 7th International Conference on Learning Representations, ICLR, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.Google Scholar
Lu, Y. and Sa, C. D. (2020). Moniqua: Modulo quantized communication in decentralized SGD.Google Scholar
Luo, L., Xiong, Y., Liu, Y., and Sun, X. (2019). Adaptive gradient methods with dynamic bound of learning rate. In Proceedings of 7th International Conference on Learning Representations, ICLR 2019, (2018):1–19.Google Scholar
Luping, W., Wei, W., and Bo, L. (2019). Cmfl: Mitigating communication overhead for federated learning. In Proceedings of IEEE 39th International Conference on Distributed Computing Systems, ICDCS.Google Scholar
Jagielski, M., Oprea, A., B. B. C. L. C. N.-R. and Li, B. (2018). Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In 2018 IEEE Symposium on Security and Privacy (SP).Google Scholar
Ma, X., Sun, H., and Qingyang Hu, R. (2020). Scheduling Policy and Power Allocation for Federated Learning in NOMA Based MEC. arXiv e-prints, page arXiv:2006.13044.Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks.Google Scholar
Mahloujifar, S., Mahmoody, M., and Mohammed, A. (2019). Data poisoning attacks in multi-party learning. In Proceedings of the 36th International Conference on Machine Learning.Google Scholar
Maity, R. K., Rawa, A. S., and Mazumdar, A. (2019). Robust gradient descent via moment encoding and ldpc codes. In Proceedings of IEEE International Symposium on Information Theory (ISIT).Google Scholar
Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. (2010). Pregel: A system for large-scale graph processing. In Proc. SIGMOD.Google Scholar
Manessi, F., Rozza, A., Bianco, S., Napoletano, P., and Schettini, R. (2018). Automated pruning for deep neural network compression. In 24th International Conference on Pattern Recognition, ICPR.Google Scholar
Martinez, I., Francis, S., and Hafid, A. S. (2019). Record and reward federated learning contributions with blockchain. In Proc of IEEE CyberC, pages 50–57.Google Scholar
Mathur, A., Lane, N. D., Bhattacharya, S., Boran, A., Forlivesi, C., and Kawsar, F. (2017). Deepeye: Resource efficient local execution of multiple deep vision models using wearable commodity hardware. Proceedings of the 15th Annual International Conference on Mobile Systems, MobiSys, Applications, and Services.Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S., and Arcas, B. A. (2017a). Communication-efficient learning of deep networks from decentralized data. In Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017b). Communication-efficient learning of deep networks from decentralized data. In Proceedings of Artificial Intelligence and Statistics, AISTATS.Google Scholar
McMahan, H. B., Moore, E., Ramage, D., Hampson, S., et al. (2017c). Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar
McMahan, H. B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017d). Communication-efficient learning of deep networks from decentralized data. In Proc. AISTATS.Google Scholar
McMahan, H. B., Ramage, D., Talwar, K., and Zhang, L. (2017e). Learning differentially private language models without losing accuracy. CoRR, abs/1710.06963.Google Scholar
Melis, L., Song, C., De Cristofaro, E., and Shmatikov, V. (2019). Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy (SP).CrossRefGoogle Scholar
Meng, Q., Chen, W., Wang, Y., Ma, Z.-M., and Liu, T.-Y. (2019). Convergence analysis of distributed stochastic gradient descent with shuffling. ArXiv, abs/1709.10432.Google Scholar
Mirhoseini, A., Goldie, A., Pham, H., Steiner, B., Le, Q. V., and Dean, J. (2018). A hierarchical model for device placement.Google Scholar
Mirza, M. and Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.Google Scholar
Mohammadi, M., Al-Fuqaha, A., Sorour, S., and Guizani, M. (2018). Deep learning for iot big data and streaming analytics: A survey. IEEE Communications Surveys & Tutorials, 20(4):29232960.Google Scholar
Mohassel, P. and Rindal, P. (2018). Aby3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.Google Scholar
Mohri, M., Sivek, G., and Suresh, A. T. (2019). Agnostic federated learning. arXiv preprint arXiv:1902.00146.Google Scholar
Mokhtari, A. and Ribeiro, A. (2016). DSA: decentralized double stochastic averaging gradient algorithm. J. Mach. Learn. Res., 17:61:1–61:35.Google Scholar
Murshed, M. G. S., Murphy, C., Hou, D., Khan, N., Ananthanarayanan, G., and Hussain, F. (2019). Machine learning at the network edge: A survey. pages 1–28.Google Scholar
Nakamoto, S. (2009). Bitcoin: A peer-to-peer electronic cash system. [Online] Available: https://bitcoin.org/bitcoin.pdf.Google Scholar
Nakandala, S., Kumar, A., and Papakonstantinou, Y. (2019). Incremental and approximate inference for faster occlusion-based deep cnn explanations. In Proceedings of the 2019 International Conference on Management of Data, pages 1589–1606.Google Scholar
Nar, K. and Shankar Sastry, S. (2018). Step size matters in deep learning. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS):3436–3444.Google Scholar
Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Ganger, G. R., Gibbons, P. B., and Zaharia, M. (2019). Pipedream: generalized pipeline parallelism for DNN training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP, Huntsville, ON, Canada, October 27-30, 2019, pages 1–15.Google Scholar
Nasr, M., Shokri, R., and Houmansadr, A. (2018). Machine learning with membership privacy using adversarial regularization. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.Google Scholar
Nasr, M., Shokri, R., and Houmansadr, A. (2019). Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy (SP).Google Scholar
Neel, S. and Roth, A. (2018). Mitigating bias in adaptive data gathering via differential privacy. CoRR, abs/1806.02329.Google Scholar
Nesterov, Y. (2012a). Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optimization, 22:341362.Google Scholar
Nesterov, Y. E. (2012b). Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optimization, 22(2):341362.Google Scholar
Neter, J., Wasserman, W. J., and Kutner, M. H. (1974). Applied linear statistical models : regression, analysis of variance, and experimental designs.Google Scholar
Nguyen, L. M., van Dijk, M., Phan, D. T., Nguyen, P. H., Weng, T.-w., and Kalagnanam, J. R. (2019). Finite-Sum Smooth Optimization with SARAH. pages 1–26.Google Scholar
Niknam, S., Dhillon, H. S., and Reed, J. H. (2019). Federated learning for wireless communications: Motivation, opportunities and challenges. arXiv preprint arXiv: 1908.06847.Google Scholar
Nishio, T. and Yonetani, R. (2019a). Client selection for federated learning with heterogeneous resources in mobile edge. ICC 2019 – 2019 IEEE International Conference on Communications (ICC).Google Scholar
Nishio, T. and Yonetani, R. (2019b). Client selection for federated learning with heterogeneous resources in mobile edge. In ICC 2019-2019 IEEE International Conference on Communications (ICC), pages 1–7. IEEE.Google Scholar
Ozfatura, E., Gündüz, D., and Ulukus, S. (2019). Speeding up distributed gradient descent by utilizing non-persistent stragglers. In Proceedings of IEEE International Symposium on Information Theory, ISIT.Google Scholar
Panageas, I., Piliouras, G., and Wang, X. (2019). First-order methods almost always avoid saddle points: the case of vanishing step-sizes. (NeurIPS):1–10.Google Scholar
Pandey, S. R., Tran, N. H., Bennis, M., Tun, Y. K., Manzoor, A., and Hong, C. S. (2020). A crowdsourcing framework for on-device federated learning. IEEE Transactions on Wireless Communications, 19(5):32413256.Google Scholar
Pang, T., Du, C., Dong, Y., and Zhu, J. (2018). Towards robust detection of adversarial examples. In Advances in Neural Information Processing Systems 31.Google Scholar
Papernot, N., Abadi, M., Âĺšlfar Erlingsson, , Goodfellow, I., and Talwar, K. (2016). Semi-supervised knowledge transfer for deep learning from private training data.Google Scholar
Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J. S., Keckler, S. W., and Dally, W. J. (2017). SCNN: an accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA.Google Scholar
Park, H., Zhai, S., Lu, L., and Lin, F. X. (2019). Streambox-tz: Secure stream analytics at the edge with trustzone. In 2019 USENIX Annual Technical Conference (USENIX ATC 19).Google Scholar
Park, J. H., Yun, G., Chang, M. Y., Nguyen, N. T., Lee, S., Choi, J., Noh, S. H., and Choi, Y.-r. (2020). Hetpipe: Enabling large {DNN} training on (whimpy) heterogeneous {GPU} clusters through integration of pipelined model parallelism and data parallelism. In 2020 {USENIX} Annual Technical Conference ({USENIX} {ATC} 20), pages 307–321.Google Scholar
Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H., and Kim, Y. (2018). Data synthesis based on generative adversarial networks. Proc. VLDB Endow., 11(10): 10711083.Google Scholar
Patel, K. K. and Dieuleveut, A. (2019). Communication trade-offs for synchronized distributed SGD with large step size. (NeurIPS):1–12.Google Scholar
Payman Mohassel, Y. Z. (2017). Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP).Google Scholar
Peng, Y., Bao, Y., Chen, Y., Wu, C., and Guo, C. (2018). Optimus: an efficient dynamic resource scheduler for deep learning clusters. In Proceedings of the Thirteenth EuroSys Conference, EuroSys.Google Scholar
Peteiro-Barral, D. and Guijarro-Berdiñas, B. (2013). A survey of methods for distributed machine learning. Progress in Artificial Intelligence, 2(1):111.Google Scholar
Pilla, L. (2020). Optimal task assignment to heterogeneous federated learning devices. ArXiv, abs/2010.00239.Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., and Gulin, A. (2018). Catboost: unbiased boosting with categorical features. In Advances in neural information processing systems, pages 6638–6648.Google Scholar
Qiao, A., Aragam, B., Zhang, B., and Xing, E. P. (2019). Fault tolerance in iterative-convergent machine learning. In Proceedings of the 36th International Conference on Machine Learning ICML, pages 5220–5230.Google Scholar
Raviv, N., Tandon, R., Dimakis, A., and Tamo, I. (2018). Gradient coding from cyclic MDS codes and expander graphs. In Proceedings of the 35th International Conference on Machine Learning, ICML, pages 4302–4310.Google Scholar
Reddi, S. J., Kale, S., and Kumar, S. (2018). On the convergence of Adam and beyond. in Proceedings of 6th International Conference on Learning Representations, ICLR, pages 1–23.Google Scholar
Ren, J., Yu, G., and Ding, G. (2019a). Accelerating dnn training in wireless federated edge learning system.Google Scholar
Ren, S., Zhang, Z., Liu, S., Zhou, M., and Ma, S. (2019b). Unsupervised Neural Machine Translation with SMT as Posterior Regularization. in Proceedings of the AAAI Conference on Artificial Intelligence, 33:241248.Google Scholar
Rob Hall, S. E. F. and Nardi, Y. (2011). Secure multiple linear regression based on homomorphic encryption. Journal of Official Statistics, 27(4):669691.Google Scholar
Robbins, H. E. (2007). A stochastic approximation method. Annals of Mathematical Statistics, 22:400407.Google Scholar
Sakr, C., Wang, N., Chen, C., Choi, J., Agrawal, A., Shanbhag, N. R., and Gopalakrishnan, K. (2019). Accumulation bit-width scaling for ultra-low precision training of deep networks. In 7th International Conference on Learning Representations, ICLR, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520.Google Scholar
Sarikaya, Y. and Ercetin, O. (2019). Motivating workers in federated learning: A stackelberg game perspective. IEEE Networking Letters, 2(1):2327.Google Scholar
Sattler, F., Müller, K.-R., and Samek, W. (2019a). Clustered federated learning: Model-agnostic distributed multi-task optimization under privacy constraints. arXiv preprint arXiv:1910.01991.Google Scholar
Sattler, F., Wiedemann, S., Müller, K.-R., and Samek, W. (2019b). Robust and communication-efficient federated learning from non-iid data. IEEE transactions on neural networks and learning systems.Google Scholar
Schein, A., Wu, Z. S., Schofield, A., Zhou, M., and Wallach, H. (2018). Locally private bayesian inference for count models.Google Scholar
Schmidt, M. W., Roux, N. L., and Bach, F. R. (2017). Minimizing finite sums with the stochastic average gradient. Mathematical Programming, 162:83112.Google Scholar
Seide, F., Fu, H., Droppo, J., Li, G., and Yu, D. (2014a). 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In Proceedings of 15th Annual Conference of the International Speech Communication Association, INTERSPEECH.Google Scholar
Seide, F., Fu, H., Droppo, J., Li, G., and Yu, D. (2014b). 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In Fifteenth Annual Conference of the International Speech Communication Association.Google Scholar
Sergeev, A. and Balso, M. D. (2018). Horovod: fast and easy distributed deep learning in tensorflow. ArXiv, abs/1802.05799.Google Scholar
Sery, T., Shlezinger, N., Cohen, K., and Eldar, Y. C. (2020). Over-the-air federated learning from heterogeneous data. ArXiv, abs/2009.12787.Google Scholar
Sharif-Nassab, A., Salehkaleybar, S., and Golestani, S. J. (2019). Order optimal one-shot distributed learning. In Proceedings of Advances in Neural Information Processing Systems, NeurIPS.Google Scholar
Shen, Y. and Sanghavi, S. (2018). Iteratively learning from the best. CoRR, abs/1810.11874.Google Scholar
Shi, S., Zhao, K., Wang, Q., Tang, Z., and Chu, X. (2019a). A convergence analysis of distributed SGD with communication-efficient gradient sparsification. In Kraus, S., editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 3411–3417. ijcai.org.Google Scholar
Shi, S., Zhao, K., Wang, Q., Tang, Z., and Chu, X. (2019b). A convergence analysis of distributed SGD with communication-efficient gradient sparsification. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI).Google Scholar
Shi, W., Cao, J., Zhang, Q., Li, Y., and Xu, L. (2016). Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3:637646.Google Scholar
Shi, W., Ling, Q., Wu, G., and Yin, W. (2015). EXTRA: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim., 25(2):944966.Google Scholar
Shi, W., Ling, Q., Yuan, K., Wu, G., and Yin, W. (2014). On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process., 62(7):17501761.Google Scholar
Shi, W., Zhou, S., and Niu, Z. (2019c). Device scheduling with fast convergence for wireless federated learning.Google Scholar
Shoham, N., Avidor, T., Keren, A., Israel, N., Benditkis, D., Mor-Yosef, L., and Zeitak, I. (2019). Overcoming forgetting in federated learning on non-iid data. arXiv preprint arXiv:1910.07796.Google Scholar
Shokri, R., Stronati, M., Song, C., and Shmatikov, V. (2017). Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP).Google Scholar
Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
Singh, N., Data, D., George, J., and Diggavi, S. (2020). Squarm-sgd: Communication-efficient momentum sgd for decentralized optimization. arXiv preprint arXiv: 2005.07041.Google Scholar
Smith, A., Thakurta, A., and Upadhyay, J. (2017). Is interaction necessary for distributed private learning? In 2017 IEEE Symposium on Security and Privacy (SP).Google Scholar
Song, C., Ristenpart, T., and Shmatikov, V. (2017). Machine learning models that remember too much. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.Google Scholar
Song, S., Lichtenberg, S. P., and Xiao, J. (2015). Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 567–576.Google Scholar
Song, T., Tong, Y., and Wei, S. (2019). Profit allocation for federated learning. In Proc. of IEEE Big Data, pages 2577–2586.Google Scholar
Staib, M., Reddi, S., Kale, S., Kumar, S., and Sra, S. (2019). Escaping saddle points with adaptive gradient methods. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:10420–10454.Google Scholar
Steinhardt, J., Koh, P. W. W., and Liang, P. S. (2017). Certified defenses for data poisoning attacks. In Advances in Neural Information Processing Systems 30.Google Scholar
Stich, S. U. (2019). Local SGD converges fast and communicates little. In Proceedings of 7th International Conference on Learning Representations, ICLR.Google Scholar
Stich, S. U., Cordonnier, J., and Jaggi, M. (2018a). Sparsified SGD with memory. In Proceedings of Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar
Stich, S. U., Cordonnier, J.-B., and Jaggi, M. (2018b). Sparsified sgd with memory. In Advances in Neural Information Processing Systems, pages 4447–4458.Google Scholar
Streeter, M. (2019). Learning optimal linear regularizers. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:10489–10498.Google Scholar
Sun, J., Chen, T., Giannakis, G., and Yang, Z. (2019). Communication-efficient distributed learning via lazily aggregated quantized gradients. In Proceedings of Advances in Neural Information Processing Systems, NeurIPS.Google Scholar
Sun, S., Chen, W., Bian, J., Liu, X., and Liu, T. (2018). Slim-dp: A multi-agent system for communication-efficient distributed deep learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, July 10-15, 2018, pages 721–729.Google Scholar
Sun, Y., Zhou, S., and GÃijndÃijz, D. (2020). Energy-aware analog aggregation for federated learning with redundant data. In ICC 2020 – 2020 IEEE International Conference on Communications (ICC), pages 1–7.Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9.Google Scholar
Tan, T., Yin, S., Liu, K., and Wan, M. (2019). On the convergence speed of AMSGRAD and beyond. in Proceedings of International Conference on Tools with Artificial Intelligence, ICTAI, 2019-Novem:464–470.Google Scholar
Tandon, R., Lei, Q., Dimakis, A. G., and Karampatziakis, N. (2017). Gradient coding: Avoiding stragglers in distributed learning. In Proceedings of the 34th International Conference on Machine Learning, ICML.Google Scholar
Tang, H., Gan, S., Zhang, C., Zhang, T., and Liu, J. (2018a). Communication compression for decentralized training. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pages 76637673.Google Scholar
Tang, H., Gan, S., Zhang, C., Zhang, T., and Liu, J. (2018b). Communication compression for decentralized training. In Advances in Neural Information Processing Systems, pages 7652–7662.Google Scholar
Tang, H., Lian, X., Qiu, S., Yuan, L., Zhang, C., Zhang, T., and Liu, J. (2019a). Deepsqueeze: Decentralization meets error-compensated compression. arXiv, pages arXiv–1907.Google Scholar
Tang, H., Yu, C., Lian, X., Zhang, T., and Liu, J. (2019b). Doublesqueeze: Parallel stochastic gradient descent with double-pass error-compensated compression. In Chaud-huri, K. and Salakhutdinov, R., editors, Proceedings of the 36th International Conference on Machine Learning, ICML.Google Scholar
Tang, H., Yu, C., Lian, X., Zhang, T., and Liu, J. (2019c). Doublesqueeze: Parallel stochastic gradient descent with double-pass error-compensated compression. In International Conference on Machine Learning, pages 6155–6165. PMLR.Google Scholar
Tao, G., Ma, S., Liu, Y., and Zhang, X. (2018). Attacks meet interpretability: Attribute-steered detection of adversarial samples. In Advances in Neural Information Processing Systems 31.Google Scholar
Tian, L. and Gu, Q. (2017). Communication-efficient distributed sparse linear discriminant analysis. In Singh, A. and Zhu, X. J., editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, volume 54 of Proceedings of Machine Learning Research, pages 1178–1187. PMLR.Google Scholar
Tong, L., Yu, S., Alfeld, S., and Vorobeychik, Y. (2018). Adversarial regression with multiple learners. CoRR, abs/1806.02256.Google Scholar
TramÂĺÂĺr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., and McDaniel, P. (2017). Ensemble adversarial training: Attacks and defenses.Google Scholar
Tran, N. H., Bao, W., Zomaya, A., Nguyen, M. N. H., and Hong, C. S. (2019). Federated learning over wireless networks: Optimization model design and analysis. In IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, pages 1387–1395.Google Scholar
Tuncer, O., Leung, V. J., and Coskun, A. K. (2015). Pacmap: Topology mapping of unstructured communication patterns onto non-contiguous allocations. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS.Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.Google Scholar
Venkataramani, S., Ranjan, A., Banerjee, S., Das, D., Avancha, S., Jagannathan, A., Durg, A., Nagaraj, D., Kaul, B., Dubey, P., and Raghunathan, A. (2017). Scaledeep: A scalable compute architecture for learning and evaluating deep networks. In Proc. ISCA.Google Scholar
Vepakomma, P., Swedish, T., Raskar, R., Gupta, O., and Dubey, A. (2018). No peek: A survey of private distributed deep learning.Google Scholar
Verbeke, J., Nadgir, N., Ruetsch, G., and Sharapov, I. (2002). Framework for peer-to-peer distributed computing in a heterogeneous, decentralized environment. In Parashar, M., editor, Grid Computing — GRID 2002, pages 112, Berlin, Heidelberg. Springer Berlin Heidelberg.Google Scholar
Viswanathan, R., Ananthanarayanan, G., and Akella, A. (2016). Clarinet: Wan-aware optimization for analytics queries. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 435–450.Google Scholar
Vogels, T., Karimireddy, S. P., and Jaggi, M. (2019). Powersgd: Practical low-rank gradient compression for distributed optimization. In Proceedings of Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar
Vu, T. T., Ngo, D. T., Tran, N. H., Ngo, H. Q., Dao, M. N., and Middleton, R. H. (2020). Cell-free massive MIMO for wireless federated learning. IEEE Transactions on Wireless Communications, Early Access.Google Scholar
Wadu, M. M., Samarakoon, S., and Bennis, M. (2020). Federated learning under channel uncertainty: Joint client scheduling and resource allocation. In 2020 IEEE Wireless Communications and Networking Conference (WCNC), pages 1–6.Google Scholar
Wang, D., Chen, C., and Xu, J. (2019a). Differentially private empirical risk minimization with non-convex loss functions. In Proceedings of the 36th International Conference on Machine Learning.Google Scholar
Wang, D., Gaboardi, M., and Xu, J. (2018a). Empirical risk minimization in non-interactive local differential privacy revisited. In Advances in Neural Information Processing Systems 31.Google Scholar
Wang, D., Ye, M., and Xu, J. (2017a). Differentially private empirical risk minimization revisited: Faster and more general. In Advances in Neural Information Processing Systems 30.Google Scholar
Wang, H., Guo, S., Tang, B., Li, R., and Li, C. (2019b). Heterogeneity-aware gradient coding for straggler tolerance. In 39th IEEE International Conference on Distributed Computing Systems, ICDCS.Google Scholar
Wang, H., Qu, Z., Guo, S., Gao, X., Li, R., and Ye, B. “Intermittent Pulling with Local Compensation for Communication-Efficient Distributed Learning,” IEEE Transactions on Emerging Topics in Computing, 2020, DOI: 10.1109/TETC.2020.3043300, Preprint.Google Scholar
Wang, H., Zhou, R., and Shen, Y.-D. (2019c). Bounding Uncertainty for Active Batch Selection. in Proceedings of the AAAI Conference on Artificial Intelligence, 33:5240– 5247.Google Scholar
Wang, J., Tantia, V., Ballas, N., and Rabbat, M. (2019d). SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum.Google Scholar
Wang, J., Tantia, V., Ballas, N., and Rabbat, M. (2019e). Slowmo: Improving communication-efficient distributed sgd with slow momentum. arXiv preprint arXiv:1910.00643.Google Scholar
Wang, K., Li, H., Maharjan, S., Zhang, Y., and Guo, S. (2018b). Green energy scheduling for demand side management in the smart grid. IEEE Transactions on Green Communications & Networking, pages 596–611.Google Scholar
Wang, K., Xu, C., and Guo, S. (2017b). Big data analytics for price forecasting in smart grids. In Global Communications Conference.Google Scholar
Wang, K., Xu, C., Zhang, Y., Guo, S., and Zomaya, A. Y. (2019f). Robust big data analytics for electricity price forecasting in the smart grid. IEEE Transactions on Big Data, 5(1):3445.Google Scholar
Wang, L., Yang, Y., Min, R., and Chakradhar, S. (2017c). Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Networks, 93: 219229.Google Scholar
Wang, M., Fang, E. X., and Liu, H. (2017d). Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions. Mathematical Programming, 161:419449.Google Scholar
Wang, M., Liu, J., and Fang, E. X. (2017e). Accelerating stochastic composition optimization. J. Mach. Learn. Res., 18:105:1–105:23.Google Scholar
Wang, S., Chen, M., Yin, C., Saad, W., Hong, C. S., Cui, S., and Poor, H. V. (2020b). Federated learning for task and resource allocation in wireless high altitude balloon networks.Google Scholar
Wang, S., Li, D., Cheng, Y., Geng, J., Wang, Y., Wang, S., Xia, S.-T., and Wu, J. (2018c). Bml: A high-performance, low-cost gradient synchronization algorithm for dml training. In Advances in Neural Information Processing Systems, pages 4238–4248.Google Scholar
Wang, S., Li, D., Cheng, Y., Geng, J., Wang, Y., Wang, S., Xia, S.-T., and Wu, J. (2018d). Bml: A high-performance, low-cost gradient synchronization algorithm for dml training. In Proc. NeurIPS.Google Scholar
Wang, S., Pi, A., and Zhou, X. (2019g). Scalable Distributed DL Training: Batching Communication and Computation. In Proceedings of the AAAI Conference on Artificial Intelligence, 33:52895296.Google Scholar
Wang, S., Pi, A., and Zhou, X. (2019h). Scalable distributed dl training: Batching communication and computation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 52895296.Google Scholar
Wang, S., Sun, J., and Xu, Z. (2019i). HyperAdam: A Learnable Task-Adaptive Adam for Network Training. In Proceedings of the AAAI Conference on Artificial Intelligence, 33:52975304.Google Scholar
Wang, Z. (2019). SpiderBoost and Momentum : Faster Stochastic Variance Reduction Algorithms. In Proceedings of NeurIPS2019, (NeurIPS).Google Scholar
Wangni, J., Wang, J., Liu, J., and Zhang, T. (2018a). Gradient sparsification for communication-efficient distributed optimization. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
Wangni, J., Wang, J., Liu, J., and Zhang, T. (2018b). Gradient sparsification for communication-efficient distributed optimization. In Advances in Neural Information Processing Systems, pages 1299–1309.Google Scholar
Ward, R., Wu, X., and Bottou, L. (2019). Adagrad stepsizes: Sharp convergence over nonconvex landscapes. In Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:11574–11583.Google Scholar
Wen, W., Xu, C., Yan, F., Wu, C., Wang, Y., Chen, Y., and Li, H. (2017a). Terngrad: Ternary gradients to reduce communication in distributed deep learning. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R., editors, Proceedings of Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar
Wen, W., Xu, C., Yan, F., Wu, C., Wang, Y., Chen, Y., and Li, H. (2017b). Terngrad: Ternary gradients to reduce communication in distributed deep learning. In Advances in neural information processing systems, pages 1509–1519.Google Scholar
Williams, R. J. and Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270280.Google Scholar
Woodworth, B. E., Patel, K. K., Stich, S. U., Dai, Z., Bullins, B., McMahan, H. B., Shamir, O., and Srebro, N. (2020). Is local SGD better than minibatch sgd? CoRR.Google Scholar
Wu, F., He, S., Yang, Y., Wang, H., Qu, Z., and Guo, S. (2020a). On the convergence of quantized parallel restarted sgd for serverless learning. CoRR.Google Scholar
Wu, J., Huang, W., Huang, J., and Zhang, T. (2018a). Error compensated quantized sgd and its applications to large-scale distributed optimization. arXiv preprint arXiv:1806.08054.Google Scholar
Wu, L., Li, S., Hsieh, C.-J., and Sharpnack, J. (2019). Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers. (NeurIPS):1–11.Google Scholar
Wu, T., Yuan, K., Ling, Q., Yin, W., and Sayed, A. H. (2018b). Decentralized consensus optimization with asynchrony and delays. IEEE Trans. Signal Inf. Process. over Networks, 4(2):293307.Google Scholar
Wu, X., Ward, R., and Bottou, L. (2018c). WNGrad: Learn the Learning Rate in Gradient Descent. pages 1–16.Google Scholar
Xia, W., Quek, T. Q. S., Guo, K., Wen, W., Yang, H. H., and Zhu, H. (2020). Multi-armed bandit based client scheduling for federated learning. IEEE Transactions on Wireless Communications, pages 1–1.Google Scholar
Xiao, L. and Zhang, T. (2014). A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optimization, 24:20572075.Google Scholar
Xiao, W., Bhardwaj, R., Ramjee, R., Sivathanu, M., Kwatra, N., Han, Z., Patel, P., Peng, X., Zhao, H., Zhang, Q., Yang, F., and Zhou, L. (2018). Gandiva: Introspective cluster scheduling for deep learning. In Proc. OSDI.Google Scholar
Xie, C., Koyejo, O., and Gupta, I. (2018). Zeno: Byzantine-suspicious stochastic gradient descent. CoRR, abs/1805.10032.Google Scholar
Xie, P., Kim, J. K., Zhou, Y., Ho, Q., Kumar, A., Yu, Y., and Xing, E. (2016). Lighter-communication distributed machine learning via sufficient factor broadcasting. In Proc. UAI.Google Scholar
Xie, P., Kim, J. K., Zhou, Y., Ho, Q., Kumar, A., Yu, Y., and Xing, E. P. (2014). Distributed machine learning via sufficient factor broadcasting. arXiv, abs/1511.08486.Google Scholar
Xie, S., Girshick, R. B., Dollár, P., Tu, Z., and He, K. (2017). Aggregated residual transformations for deep neural networks. In Proc. CVPR, pages 5987–5995.Google Scholar
Xing, E. P., Ho, Q., Xie, P., and Dai, W. (2015). Strategies and principles of distributed machine learning on big data. CoRR.Google Scholar
Xing, E. P., Ho, Q., Xie, P., and Wei, D. (2016). Strategies and principles of distributed machine learning on big data. Engineering, 2(2):179195.Google Scholar
Xing, H., Simeone, O., and Bi, S. (2020). Decentralized federated learning via sgd over wireless d2d networks. In 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pages 1–5.Google Scholar
Xu, J. and Wang, H. (2020). Client selection and bandwidth allocation in wireless federated learning networks: A long-term perspective.Google Scholar
Xu, S., Zhang, H., Neubig, G., Dai, W., Kim, J. K., Deng, Z., Ho, Q., Yang, G., and Xing, E. P. (2018). Cavs: An efficient runtime system for dynamic neural networks. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 937–950.Google Scholar
Xu, Y., Dong, X., Li, Y., and Su, H. (2019). A main/subsidiary network framework for simplifying binary neural networks. In in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 7154–7162.Google Scholar
Xu, Z., Zhang, Y., Fu, C., Liu, L., and Guo, S. (2020). Back shape measurement and three-dimensional reconstruction of spinal shape using one kinect sensor. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI).Google Scholar
Yan, Z., Guo, Y., and Zhang, C. (2018). Deep defense: Training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems 31.Google Scholar
Yang, D., Xue, G., Fang, X., and Tang, J. (2016). Incentive mechanisms for crowd-sensing: Crowdsourcing with smartphones. IEEE/ACM Transactions on Networking, 24(3):17321744.Google Scholar
Yang, H. H., Arafa, A., Quek, T. Q. S., and Vincent Poor, H. (2020). Age-based scheduling policy for federated learning in mobile edge networks. In ICASSP 2020 – 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8743–8747.Google Scholar
Yang, H. H., Liu, Z., Quek, T. Q. S., and Poor, H. V. (2019a). Scheduling policies for federated learning in wireless networks.Google Scholar
Yang, K., Jiang, T., Shi, Y., and Ding, Z. (2020). Federated learning via over-the-air computation. IEEE Transactions on Wireless Communications, 19(3):20222035.Google Scholar
Yang, K., Shi, Y., Zhou, Y., Yang, Z., Fu, L., and Chen, W. (2020). Federated machine learning for intelligent iot via reconfigurable intelligent surface. IEEE Network, 34(5):1622.Google Scholar
Yang, Q., Liu, Y., Chen, T., and Tong, Y. (2019b). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):119.Google Scholar
Yang, Y., Zhang, G., Katabi, D., and Xu, Z. (2019c). Me-net: Towards effective adversarial robustness with matrix estimation. CoRR, abs/1905.11971.Google Scholar
Yang, Z., Chen, M., Saad, W., Hong, C. S., and Shikh-Bahaei, M. (2019d). Energy efficient federated learning over wireless communication networks.Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V. (2019e). Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pages 5754–5764.Google Scholar
Yeganeh, Y., Farshad, A., Navab, N., and Albarqouni, S. (2020). Inverse distance aggregation for federated learning with non-iid data. arXiv preprint arXiv:2008.07665.Google Scholar
Yeh, T. T., Sabne, A., Sakdhnagool, P., Eigenmann, R., and Rogers, T. G. (2019). Pagoda: A gpu runtime system for narrow tasks. ACM Transactions on Parallel Computing (TOPC), 6(4):123.Google Scholar
Yin, D., Chen, Y., Ramchandran, K., and Bartlett, P. L. (2018a). Byzantine-robust distributed learning: Towards optimal statistical rates. CoRR, abs/1803.01498.Google Scholar
Yin, D., Chen, Y., Ramchandran, K., and Bartlett, P. L. (2018b). Defending against saddle point attack in byzantine-robust distributed learning. CoRR, abs/1806.05358.Google Scholar
Yin, D., Pananjady, A., Lam, M., Papailiopoulos, D. S., Ramchandran, K., and Bartlett, P. L. (2018c). Gradient diversity: a key ingredient for scalable distributed learning. In Proceedings of International Conference on Artificial Intelligence and Statistics, AISTATS.Google Scholar
Ying, B., Yuan, K., Vlaski, S., and Sayed, A. H. (2019). Stochastic Learning Under Random Reshuffling With Constant Step-Sizes. In IEEE Transactions on Signal Processing, volume 67, pages 474489.Google Scholar
Yong, H., Huang, J., Hua, X., and Zhang, L. (2020). Gradient Centralization: A New Optimization Technique for Deep Neural Networks.Google Scholar
You, K., Long, M., Wang, J., and Jordan, M. I. (2019). How Does Learning Rate Decay Help Modern Neural Networks?Google Scholar
You, Y., Li, J., Reddi, S. J., Hseu, J., Kumar, S., Bhojanapalli, S., Song, X., Demmel, J., Keutzer, K., and Hsieh, C. (2020a). Large batch optimization for deep learning: Training BERT in 76 minutes. In Proceedings of 8th International Conference on Learning Representations, ICLR.Google Scholar
You, Y., Wang, Y., Zhang, H., Zhang, Z., Demmel, J., and Hsieh, C. (2020b). The limit of the batch size. CoRR.Google Scholar
You, Y., Zhang, Z., Hsieh, C.-J., Demmel, J., and Keutzer, K. (2018). Imagenet training in minutes. In Proceedings of the 47th International Conference on Parallel Processing, ICPP.Google Scholar
Yu, L., Liu, L., Pu, C., Gursoy, M. E., and Truex, S. (2019). Differentially private model publishing for deep learning. In 2019 IEEE Symposium on Security and Privacy (SP).Google Scholar
Yu, P. and Chowdhury, M. (2019). Salus: Fine-grained gpu sharing primitives for deep learning applications. CoRR.Google Scholar
Yu, Q., Li, S., Raviv, N., Kalan, S. M. M., Soltanolkotabi, M., and Avestimehr, A. S. (2019). Lagrange coded computing: Optimal design for resiliency, security, and privacy. In Proceedings of The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS.Google Scholar
Yuan, J. and Yu, S. (2014). Privacy preserving back-propagation neural network learning made practical with cloud computing. IEEE Transactions on Parallel and Distributed Systems, 25(1):212221.Google Scholar
Yuan, K., Ling, Q., and Yin, W. (2016). On the convergence of decentralized gradient descent. SIAM J. Optim., 26(3):18351854.Google Scholar
Yuan, X., Feng, Z., Norton, M., and Li, X. (2019). Generalized Batch Normalization: Towards Accelerating Deep Neural Networks. in Proceedings of the AAAI Conference on Artificial Intelligence, 33:16821689.Google Scholar
Yue, Yu, Jiaxiang, Wu, L. H. (2019). Double quantization for communication-efficient distributed optimization. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pages 4440–4451.Google Scholar
Yurochkin, M., Agarwal, M., Ghosh, S., Greenewald, K. H., Hoang, T. N., and Khazaeni, Y. (2019). Bayesian nonparametric federated learning of neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML.Google Scholar
Zaheer, M., Reddi, S. J., Sachan, D., Kale, S., and Kumar, S. (2018). Adaptive methods for nonconvex optimization. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS):9793–9803.Google Scholar
Zeng, D., Gu, L., Lian, L., Guo, S., Yao, H., and Hu, J. (2016). On cost-efficient sensor placement for contaminant detection in water distribution systems. IEEE Transactions on Industrial Informatics, 12(6):21772185.Google Scholar
Zeng, Q., Du, Y., Leung, K. K., and Huang, K. (2019). Energy-efficient radio resource allocation for federated edge learning.Google Scholar
Zeng, R., Zhang, S., Wang, J., and Chu, X. (2020a). Fmore: An incentive scheme of multidimensional auction for federated learning in mec. arXiv preprint arXiv:2002.09699.Google Scholar
Zeng, T., Semiari, O., Mozaffari, M., Chen, M., Saad, W., and Bennis, M. (2020b). Federated learning in the sky: Joint power allocation and scheduling with uav swarms.Google Scholar
Zhan, Y. and Zhang, J. (2020). An incentive mechanism design for efficient edge learning by deep reinforcement learning approach. In Proc. of IEEE INFOCOM, pages 2489– 2498.Google Scholar
Zhang, C., Öztireli, C., Mandt, S., and Salvi, G. (2019a). Active Mini-Batch Sampling Using Repulsive Point Processes. in Proceedings of the AAAI Conference on Artificial Intelligence, 33:57415748.Google Scholar
Zhang, G., Li, L., Nado, Z., Martens, J., Sachdeva, S., Dahl, G. E., Shallue, C. J., and Grosse, R. (2019b). Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model. (NeurIPS):1–12.Google Scholar
Zhang, H., Li, J., Kara, K., Alistarh, D., Liu, J., and Zhang, C. (2017a). Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th International Conference on Machine Learning, ICML.Google Scholar
Zhang, H., Zheng, Z., Xu, S., Dai, W., Ho, Q., Liang, X., Hu, Z., Wei, J., Xie, P., and Xing, E. P. (2017b). Poseidon: An efficient communication architecture for distributed deep learning on GPU clusters. In Proc. ATC.Google Scholar
Zhang, J., Hong, Z., Qiu, X., Zhan, Y., Guo, S., and Chen, W. (2020). Skychain: A deep reinforcement learning-empowered dynamic blockchain sharding system. In International Conference on Parallel Processing (ICPP.Google Scholar
Zhang, M., Rajbhandari, S., Wang, W., and He, Y. (2018). Deepcpu: Serving rnn-based deep learning models 10x faster. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 951–965.Google Scholar
Zhang, N. and Tao, M. (2020). Gradient statistics aware power control for over-the-air federated learning.Google Scholar
Zhang, N. and Tao, M. (2020). Gradient statistics aware power control for over-the-air federated learning in fading channels. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops), pages 1–6.Google Scholar
Zhang, Q., Yang, L. T., and Chen, Z. (2016). Privacy preserving deep computation model on cloud for big data feature learning. IEEE Transactions on Computers, 65(5):1351– 1362.Google Scholar
Zhang, X., Zhao, R., Yan, J., Gao, M., Qiao, Y., Wang, X., and Li, H. (2019c). P2SGRAD: Refined gradients for optimizing deep face models. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June:9898–9906.Google Scholar
Zhang, Y., Qu, H., Chen, C., and Metaxas, D. (2019d). Taming the noisy gradient: Train deep neural networks with small batch sizes. in Proceedings of IJCAI International Joint Conference on Artificial Intelligence, 2019-Augus:4348–4354.Google Scholar
Zhao, J. (2018). Distributed deep learning under differential privacy with the teacher-student paradigm. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar
Zhao, S., Xie, Y., Gao, H., and Li, W. (2019a). Global momentum compression for sparse communication in distributed SGD. CoRR, abs/1905.12948.Google Scholar
Zhao, T., Zhang, Y., and Olukotun, K. (2019b). Serving recurrent neural networks efficiently with a spatial accelerator.Google Scholar
Zheng, K., Mou, W., and Wang, L. (2017a). Collect at once, use effectively: Making non-interactive locally private learning possible. In Proceedings of the 34th International Conference on Machine Learning - Volume 70.Google Scholar
Zheng, S., Huang, Z., and Kwok, J. (2019). Communication-efficient distributed block-wise momentum sgd with error-feedback. In Advances in Neural Information Processing Systems, pages 11450–11460.Google Scholar
Zheng, W., Popa, R. A., Gonzalez, J. E., and Stoica, I. (2019). Helen: Maliciously secure coopetitive learning for linear models. In 2019 IEEE Symposium on Security and Privacy (SP).Google Scholar
Zheng, Z. and Hong, P. (2018). Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks. In Advances in Neural Information Processing Systems 31.Google Scholar
Zheng, Z., Xie, S., Dai, H., Chen, X., and Wang, H. (2017b). An overview of blockchain technology: Architecture, consensus, and future trends. In 2017 IEEE International Congress on Big Data (BigData Congress).Google Scholar
Zhou, F. and Cong, G. (2018). On the convergence properties of a k-step averaging stochastic gradient descent algorithm for nonconvex optimization. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI.Google Scholar
Zhou, Z., Chen, X., Li, E., Zeng, L., Luo, K., and Zhang, J. (2019). Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proceedings of the IEEE, 107(8):17381762.Google Scholar
Zhu, G., Du, Y., Gündüz, D., and Huang, K. (2020). One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis. CoRR, abs/2001.05713.Google Scholar
Zhu, G., Liu, D., Du, Y., You, C., Zhang, J., and Huang, K. (2018). Towards an intelligent edge: Wireless communication meets machine learning. CoRR, abs/1809.00343.Google Scholar
Zinkevich, M., Weimer, M., Li, L., and Smola, A. J. (2010). Parallelized stochastic gradient descent. In Advances in neural information processing systems, pages 2595– 2603.Google Scholar
Zou, F., Shen, L., Jie, Z., Zhang, W., and Liu, W. (2019). A sufficient condition for convergences of adam and rmsprop. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June(1):11119–11127.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Bibliography
  • Song Guo, The Hong Kong Polytechnic University, Zhihao Qu, The Hong Kong Polytechnic University
  • Book: Edge Learning for Distributed Big Data Analytics
  • Online publication: 14 January 2022
  • Chapter DOI: https://doi.org/10.1017/9781108955959.013
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Bibliography
  • Song Guo, The Hong Kong Polytechnic University, Zhihao Qu, The Hong Kong Polytechnic University
  • Book: Edge Learning for Distributed Big Data Analytics
  • Online publication: 14 January 2022
  • Chapter DOI: https://doi.org/10.1017/9781108955959.013
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Bibliography
  • Song Guo, The Hong Kong Polytechnic University, Zhihao Qu, The Hong Kong Polytechnic University
  • Book: Edge Learning for Distributed Big Data Analytics
  • Online publication: 14 January 2022
  • Chapter DOI: https://doi.org/10.1017/9781108955959.013
Available formats
×