syllabus

Syllabus

　

Course Title: Neural Network Acceleration (神經網路加速技術)

Course No. 53026

Semester: Fall, 2020

Time and Classroom: Mon. +13:30-14:05-17:00 (E404)

Credits: 3

Lector: Professor Tsung-Chu Huang (黃宗柱)

Email: tch@cc.ncue.edu.tw

Tel. (04)723-2105 ext. 8384

Website: http://t estlab.ncue.edu.tw/tch

Cloud: https://drive.google.com/drive/folders/1EZG40mE2-NDZH9r5QzjHjP5nreGNSJvw?usp=sharing

Schedule and Contents
    Week   1    Introduction to Neural Network; Lab1: Python Programming in Jupyter Online and Colab

    Week   2    Review of Linear Algebra; Lab2: Introduction to Anaconda/Tensorflow/Keras (Xilinx Workshop)

    Week   3    (Leaving for ICCE-TW2020)

    Week   4    Types of Neural Networks; Lab3: Introduction to CNN

    Week   5    Issues of NNs: Overfitting, Dynamic Range Inflation; Lab4: Introduction to RNN

    Week   6    Data Flow Graph/Design; Lab5: Introduction to Pynq Platform and Workshop Sessions 1-2

    Week   7    HW Acceleration: GPU, NPU, VPU, CPU-based Acceleration; Lab6: Xilinx Workshop Sessions 3-4

    Week   8    Approximating Computing: Analogue, LUT, Range LUT, Binary Truncation, AC for Batch Learning; Lab7: RCAM-LUT

    Week   9    Midterm: Report and presentation on NN Acceleration

    Week 10    Residue Number System; Lab8: MAC Array

    Week 11    Matrix-Operation-based Pruning Techniques; Lab9: Pruning

    Week 12    SW Acceleration: Efficient Algorithms, Pruning, Random Dropping; Lab10: Backpropagation

    Week 13    Network Optimization; Lab11: HLS

    Week 14    Model Optimization; Lab12: HLS + HLx

    Week 15    Model Compression; Lab13: Tensor Array

    Week 16    DMA, CUDA, Memory Allocation; Lab14: Approximating

    Week 17    Hybrid I/O and Codesign; Lab15: Configuration Automation

    Week 18    Analogue Neuromorphic Network： RRAM, MRAM, Charge-Capacitor, OTA－based

    Week 19    Demo of Final Project: Applying your technique in Pynq Accelerator

　
Textbook and References
- Textbook: (eBook bought by NCUE and available in Campus)
- Movies:
  - 3Blue1Brown. Introduction to Artificial Neural Network. https://www.youtube.com/watch?v=aircAruvnKk.
  - PYNQ--AI菜鸟的升级必备(菜鳥強心劑)。https://www.youtube.com/watch?v=LilOaVBYXJo.
  - 周莫凡(莫煩)，科普: 人工神经网络 VS 生物神经网络。https://www.youtube.com/watch?v=lAaCeiqE6CE&list=PLXO45tsB95cKI5AIlf5TxxFPzb-0zeVZ8.
  - CCU MOOCs. "深度學習", https://www.youtube.com/watch?v=wEL5nWBoThw&list=PLQn99bzkJv9wxnnU3nfuFm1MtWDbtps0P.
  - Seeed. PYNQ Z2 FPGA Development board | Hardware Overview. https://www.youtube.com/watch?v=z3dZb5Y-vvo.
  - FPGAdeveloper. How to setup the PYNQ-Z1. https://www.youtube.com/watch?v=lfw6T9vNM_o.
  - FPGAdeveloper. Xilinx Pynq Workshop Series. https://www.youtube.com/watch?v=LoLCtSzj9BU&list=PLIs1b8ziMXQZeg06pmqSquK70ydQLA4y5.
  - FPGAdeveloper. How to make a custom PYNQ overlay. https://www.youtube.com/watch?v=Dupyek4NUoI.
  - FPGAdeveloper. How to accelerate a Python function with PYNQ. https://www.youtube.com/watch?v=LoLCtSzj9BU.
  - FPGAdeveloper. Using AXI DMA in Vivado. https://www.youtube.com/watch?v=Yklu68WopBo.
  - FPGAdeveloper. Creating a custom AXI-Streaming IP in Vivado. https://www.youtube.com/watch?v=R8MSpEU7UKE.
  - FPGAdeveloper. PYNQ Computer Vision Demo: 2D Filter & Dilate. https://www.youtube.com/watch?v=tzQlyEj71Us.
  - Hardware.ai. PYNQ Z2 First Boot and ML acceleration examples. https://www.youtube.com/watch?v=FA3jiIkoN-Q.
  - HR G. Top 5 fastest supercomputers in the world. https://www.youtube.com/watch?v=PoEVBYdUJjk.
  - CNBC. What Is A Supercomputer? https://www.youtube.com/watch?v=utsi6h7IFPs.
- Links:
  - SD ImageWriter: https://sourceforge.net/projects/win32diskimager
  - PYNQ Image: https://pynq.readthedocs.io/en/v2.2.1/getting_started/pynq_image.html
  - Vivado AXI GPIO basic settings: https://www.youtube.com/watch?v=3SSKzK84AnE (in Chinese)
- Reference:
  [1] C. Fu, S. Zhu, H. Chen, F. Koushanfar, H. Su and J. Zhao, "SimBNN: A Similarity-Aware Binarized Neural Network Acceleration Framework," 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), San Diego, CA, USA, 2019, pp. 319-319, doi: 10.1109/FCCM.2019.00060.
  [2] K. Jearanaitanakij and O. Pinngern, "An Information Gain Technique for Acceleration of Convergence of Artificial Neural Networks," 2005 5th International Conference on Information Communications & Signal Processing, Bangkok, 2005, pp. 349-352, doi: 10.1109/ICICS.2005.1689065.
  [3] F. Artoni, D. Martelli, V. Monaco and S. Micera, "Principal component analysis can decrease neural networks performance for incipient falls detection: A preliminary study with hands and feet accelerations," 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, 2016, pp. 6194-6197, doi: 10.1109/EMBC.2016.7592143.
  [4] Seung-Joon Lee and Dong-Jo Park, "New accelerated learning algorithm motivated from novel shape of error surfaces for multilayer feedforward neural networks," Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), Nagoya, Japan, 1993, pp. 553-556 vol.1, doi: 10.1109/IJCNN.1993.713975.
  [5] Bin-Chul Ihm and Dong-Jo Park, "Acceleration of learning speed in neural networks by reducing weight oscillations," IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), Washington, DC, USA, 1999, pp. 1729-1732 vol.3, doi: 10.1109/IJCNN.1999.832637.
  [6] L. Liu, J. Luo, X. Deng and S. Li, " C. Xue, S. Cao, R. Jiang and H. Yang, "A Reconfigurable Pipelined Architecture for Convolutional Neural Network Acceleration," 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, 2018, pp. 1-5, doi: 10.1109/ISCAS.2018.8351425.," 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), Krakow, 2015, pp. 824-827, doi: 10.1109/3PGCIC.2015.103.
  [7] B. L. Deng, G. Li, S. Han, L. Shi and Y. Xie, "Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey," in Proceedings of the IEEE, vol. 108, no. 4, pp. 485-532, April 2020, doi: 10.1109/JPROC.2020.2976475.
  [8] K. Ochiai, N. Toda and S. Usui, "New accelerated learning algorithm to reduce the oscillation of weights in multilayered neural networks," [Proceedings 1992] IJCNN International Joint Conference on Neural Networks, Baltimore, MD, USA, 1992, pp. 914-919 vol.1, doi: 10.1109/IJCNN.1992.287070.
  [9] Lark Sang Kim, "Initializing weights to a hidden layer of a multilayer neural network by linear programming," Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), Nagoya, Japan, 1993, pp. 1701-1704 vol.2, doi: 10.1109/IJCNN.1993.716981.
  [10] A. G. Thome and M. F. Tenorio, "Dynamic adaptation of the error surface for the acceleration of the training of neural networks," Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94), Orlando, FL, USA, 1994, pp. 447-452 vol.1, doi: 10.1109/ICNN.1994.374204.
  [11] R. Kamimura, "Activated hidden connections to accelerate the learning in recurrent neural networks," [Proceedings 1992] IJCNN International Joint Conference on Neural Networks, Baltimore, MD, USA, 1992, pp. 693-700 vol.1, doi: 10.1109/IJCNN.1992.287106.
  [12] K. Guo, S. Han, S. Yao, Y. Wang, Y. Xie and H. Yang, "Software-Hardware Codesign for Efficient Neural Network Acceleration," in IEEE Micro, vol. 37, no. 2, pp. 18-25, Mar.-Apr. 2017, doi: 10.1109/MM.2017.39.
  [13] Pu Sun and K. Marko, "Estimation of the training efficiency of recurrent neural networks," Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, Como, Italy, 2000, pp. 583-588 vol.4, doi: 10.1109/IJCNN.2000.860834.
  [14] Z. Chiliang, H. Tao, G. Yingda and Y. Zuochang, "Accelerating Convolutional Neural Networks with Dynamic Channel Pruning," 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 2019, pp. 563-563, doi: 10.1109/DCC.2019.00075.
  [15] M. J. Perez-Ilzarbe, "Preconditioning method to accelerate neural networks gradient training algorithms," IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), Washington, DC, USA, 1999, pp. 1384-1388 vol.2, doi: 10.1109/IJCNN.1999.831165.
  [16] L. D. Medus, T. Iakymchuk, J. V. Frances-Villora, M. Bataller-Mompeán and A. Rosado-Muñoz, "A Novel Systolic Parallel Hardware Architecture for the FPGA Acceleration of Feedforward Neural Networks," in IEEE Access, vol. 7, pp. 76084-76103, 2019, doi: 10.1109/ACCESS.2019.2920885.
  [17] Z. Tang and G. J. Koehler, "A convergent neural network learning algorithm," [Proceedings 1992] IJCNN International Joint Conference on Neural Networks, Baltimore, MD, USA, 1992, pp. 127-132 vol.2, doi: 10.1109/IJCNN.1992.226973.
  [18] Y. Zhang, W. Ding and C. Liu, "Summary of Convolutional Neural Network Compression Technology," 2019 IEEE International Conference on Unmanned Systems (ICUS), Beijing, China, 2019, pp. 480-483, doi: 10.1109/ICUS48101.2019.8995969.
  [19] T. Oohashi and T. Ejima, "An artificial neural network simulator on the loosely coupled parallel processors," IJCNN-91-Seattle International Joint Conference on Neural Networks, Seattle, WA, USA, 1991, pp. 922 vol.2-, doi: 10.1109/IJCNN.1991.155575.
  [20] L. Li, Y. Tong, H. Zhang and D. Wan, "Memory Saving Method for Enhanced Convolution of Deep Neural Network," 2018 11th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 2018, pp. 185-188, doi: 10.1109/ISCID.2018.00049.
  [21] P. Colangelo, R. Huang, E. Luebbers, M. Margala and K. Nealis, "Fine-Grained Acceleration of Binary Neural Networks Using Intel® Xeon® Processor with Integrated FPGA," 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, 2017, pp. 135-135, doi: 10.1109/FCCM.2017.46.
  [22] S. Zickenheiner, M. Wendt, B. Klauer and K. Waldschmidt, "Pipelining and parallel training of neural networks on distributed-memory multiprocessors," Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94), Orlando, FL, USA, 1994, pp. 2052-2057 vol.4, doi: 10.1109/ICNN.1994.374529.
  [23] L. Li, Z. Li, Y. Li, B. Kathariya and S. Bhattacharyya, "Incremental Deep Neural Network Pruning Based on Hessian Approximation," 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 2019, pp. 590-590, doi: 10.1109/DCC.2019.00102.
  [24] G. Castellano, A. M. Fanelli and M. Pelillo, "An empirical comparison of node pruning methods for layered feedforward neural networks," Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), Nagoya, Japan, 1993, pp. 321-326 vol.1, doi: 10.1109/IJCNN.1993.713922.
  [25] L. Ma and K. Khorasani, "Input-side training in constructive neural networks based on error scaling and pruning," Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, Como, Italy, 2000, pp. 455-460 vol.6, doi: 10.1109/IJCNN.2000.859437.
  [26] L. Fujin, H. Meijie, R. Hongge and Z. Wenbin, "A neural network algorithm for fast pruning based on remarkable analysis," The 26th Chinese Control and Decision Conference (2014 CCDC), Changsha, 2014, pp. 184-188, doi: 10.1109/CCDC.2014.6852141.
  [27] P. V. S. Ponnapalli, K. C. Ho and M. Thomson, "A formal selection and pruning algorithm for feedforward artificial neural network optimization," in IEEE Transactions on Neural Networks, vol. 10, no. 4, pp. 964-968, July 1999, doi: 10.1109/72.774273.
  [28] E. Watanabe and H. Shimizu, "Algorithm for pruning hidden units in multilayered neural network for binary pattern classification problem," Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), Nagoya, Japan, 1993, pp. 327-330 vol.1, doi: 10.1109/IJCNN.1993.713923.
  [29] A. Ismail and A. P. Engelbrecht, "Pruning product unit neural networks," Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290), Honolulu, HI, USA, 2002, pp. 257-262 vol.1, doi: 10.1109/IJCNN.2002.1005479.
  [30] Hyeyoung Park and Hyunjin Lee, "Reconsideration to pruning and regularization for complexity optimization in neural networks," Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02., Singapore, 2002, pp. 1649-1653 vol.4, doi: 10.1109/ICONIP.2002.1198955.
  [31] Han Honggui and Qiao Junfei, "A novel pruning algorithm for self-organizing neural network," 2009 International Joint Conference on Neural Networks, Atlanta, GA, 2009, pp. 1245-1250, doi: 10.1109/IJCNN.2009.5178581.
  [32] R. Setiono and A. Gaweda, "Neural network pruning for function approximation," Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, Como, Italy, 2000, pp. 443-448 vol.6, doi: 10.1109/IJCNN.2000.859435.
  [33] Hajoon Lee and Cheol Hoon Park, "A pruning algorithm of neural networks using impact factor regularization," Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02., Singapore, 2002, pp. 2605-2609 vol.5, doi: 10.1109/ICONIP.2002.1201967.
  [34] Weishui Wan, K. Hirasawa, Jinglu Hu and Chunzhi Jin, "A new method to prune the neural network," Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, Como, Italy, 2000, pp. 449-454 vol.6, doi: 10.1109/IJCNN.2000.859436.
  [35] T. Nagata, A. Kawata, K. Yamada and R. Nakano, "Neural network pruning using MV regularizer," Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290), Honolulu, HI, USA, 2002, pp. 1051-1055 vol.1, doi: 10.1109/IJCNN.2002.1005621.
  [36] B. E. Segee and M. J. Carter, "Fault tolerance of pruned multilayer networks," IJCNN-91-Seattle International Joint Conference on Neural Networks, Seattle, WA, USA, 1991, pp. 447-452 vol.2, doi: 10.1109/IJCNN.1991.155374.
  [37] A. Sankar and R. J. Mammone, "Optimal pruning of neural tree networks for improved generalization," IJCNN-91-Seattle International Joint Conference on Neural Networks, Seattle, WA, USA, 1991, pp. 219-224 vol.2, doi: 10.1109/IJCNN.1991.155341.
  [38] B. Kijsirikul and K. Chongkasemwongse, "Decision tree pruning using backpropagation neural networks," IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222), Washington, DC, USA, 2001, pp. 1876-1880 vol.3, doi: 10.1109/IJCNN.2001.938449.
  [39] R. Setiono and Wee Kheng Leow, "Generating rules from trained network using fast pruning," IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), Washington, DC, USA, 1999, pp. 4095-4098 vol.6, doi: 10.1109/IJCNN.1999.830817.
  [40] Xiahua Yang, "A convenient method to prune multilayer neural networks via transform domain backpropagation algorithm," [Proceedings 1992] IJCNN International Joint Conference on Neural Networks, Baltimore, MD, USA, 1992, pp. 817-822 vol.3, doi: 10.1109/IJCNN.1992.227051.
  [41] R. Esfandiarpoor, M. Hajabdollahi and N. Karimi, "Simplified Neural Network Based on Auxiliary Layer and Adaptive Pruning Rate," Electrical Engineering (ICEE), Iranian Conference on, Mashhad, 2018, pp. 1-5, doi: 10.1109/ICEE.2018.8472540.
  [42] J. T. Lo, "Statistical method of pruning neural networks," IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), Washington, DC, USA, 1999, pp. 1678-1680 vol.3, doi: 10.1109/IJCNN.1999.832626.
  [43] Y. Xiong, L. Wang and D. Li, "Training Feedforward Neural Networks by Pruning Algorithm Based on Grey Incidence Analysis," 2008 Second International Symposium on Intelligent Information Technology Application, Shanghai, 2008, pp. 535-539, doi: 10.1109/IITA.2008.125.
  [44] S. Sen, S. Venkataramani and A. Raghunathan, "Approximate computing for spiking neural networks," Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, Lausanne, 2017, pp. 193-198, doi: 10.23919/DATE.2017.7926981.
  [45] T. Ayhan and M. Altun, "Approximate Fully Connected Neural Network Generation," 2018 15th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), Prague, 2018, pp. 93-96, doi: 10.1109/SMACD.2018.8434843.
  [46] T. Moreau, A. Sampson and L. Ceze, "Approximate Computing: Making Mobile Systems More Efficient," in IEEE Pervasive Computing, vol. 14, no. 2, pp. 9-13, Apr.-June 2015, doi: 10.1109/MPRV.2015.25.
  [47] Z. Peng et al., "AXNet: ApproXimate computing using an end-to-end trainable neural network," 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, CA, 2018, pp. 1-8.
  [48] L. Song, Y. Wang, Y. Han, H. Li, Y. Cheng and X. Li, "STT-RAM Buffer Design for Precision-Tunable General-Purpose Neural Network Accelerator," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 4, pp. 1285-1296, April 2017, doi: 10.1109/TVLSI.2016.2644279.
  [49] M. Franceschi, A. Nannarelli and M. Valle, "Tunable Floating-Point for Artificial Neural Networks," 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Bordeaux, 2018, pp. 289-292, doi: 10.1109/ICECS.2018.8617900.
  [50] T. Cheng, J. Yu and M. Hashimoto, "Minimizing Power for Neural Network Training with Logarithm-Approximate Floating-Point Multiplier," 2019 29th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), Rhodes, Greece, 2019, pp. 91-96, doi: 10.1109/PATMOS.2019.8862162.
  [51] H. Song et al., "Invocation-driven Neural Approximate Computing with a Multiclass-Classifier and Multiple Approximators," 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, CA, 2018, pp. 1-8, doi: 10.1145/3240765.3240819.
  [52] X. He, G. Yan, W. Lu, X. Zhang and K. Liu, "A Quantitative Exploration of Collaborative Pruning and Approximation Computing Towards Energy Efficient Neural Networks," in IEEE Design & Test, vol. 37, no. 1, pp. 36-45, Feb. 2020, doi: 10.1109/MDAT.2019.2943575.
  [53] X. He, W. Lu, G. Yan and X. Zhang, "Joint Design of Training and Hardware Towards Efficient and Accuracy-Scalable Neural Network Inference," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 8, no. 4, pp. 810-821, Dec. 2018, doi: 10.1109/JETCAS.2018.2845396.
  [54] B. Li, Y. Qin, B. Yuan and D. J. Lilja, "Neural Network Classifiers Using Stochastic Computing with a Hardware-Oriented Approximate Activation Function," 2017 IEEE International Conference on Computer Design (ICCD), Boston, MA, 2017, pp. 97-104, doi: 10.1109/ICCD.2017.23.
  [55] M. Obara et al., "Acceleration for query-by-example using posteriorgram of deep neural network," 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, 2017, pp. 1565-1569, doi: 10.1109/APSIPA.2017.8282291.
  [56] Augasta, M.G., Kathirvalavakumar, T. Pruning algorithms of neural networks — a comparative study. centr.eur.j.comp.sci. 3, 105–115 (2013)
  [57] D. Palzer and B. Hutchinson, "The Tensor Deep Stacking Network Toolkit," 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, 2015, pp. 1-5, doi: 10.1109/IJCNN.2015.7280297.
  [58] Q Zhang, M Zhang, T Chen, Z Sun, Y Ma, B Yu. "Recent advances in convolutional neural network acceleration," Neurocomputing, 2019 – Elsevier.
  [59] E. Wang et al. "Deep neural network approximation for custom hardware: where we’ve been, where we’re going," Arxiv, 2019.
  [60] V. Sze, Y.-H. Chen, T.-J. Yang, and J.S. Emer, Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 2017.
  [61] S.V.Kamarthi and S.Pittner. "Accelerating neural network training using weight extrapolations," Neural Networks, vo.12, no.9, Nov. 1999, pp.1285-1299.
  [62] H. Xie, S. Zhang, H. Ding, Y. Song, B. Shao, C. Hu, L. Cai, M. Li. "Accelerating neural network inference by overflow aware quantization," arXiv:2005.13297, May 2020.
  [63] Sotirov, S. "A method of accelerating neural network learning," Neural Process Lett 22, 163–169 (2005).
  [64] A. Sinha et al. "IntroSpection: accelerating neural network training by learning weight evolution," ICLR 2017.
  [65] Song L, Qian X, Li H, Chen Y. "Pipelayer: a pipelined ReRAM-based accelerator for deep learning." In: Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture; 2017 Feb 4–8; Austin, TX, USA;
  [66] H. Esmaeilzadeh, A. Sampson, L. Ceze and D. Burger, "Neural Acceleration for General-Purpose Approximate Programs," in IEEE Micro, vol. 33, no. 3, pp. 16-27, May-June 2013.
  [67] Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems; 2014 March 1–5; Salt Lake City, UT, USA; 2014. p. 269–84.
  [68] Chen YH, Krishna T, Emer JS, Sze V. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 2017;52(1):127–38.
  [69] Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, et al. Indatacenter performance analysis of a tensor processing unit. In: Proceedings of 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture; 2017 Jun 24–28; Toronto, ON, Canada; 2017. p. 1–12.
  [70] A. Shawahna, S. M. Sait and A. El-Maleh, "FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review," in IEEE Access, vol. 7, pp. 7823-7859, 2019, doi: 10.1109/ACCESS.2018.2890150.
  [71] Raju K, Niranjan N. Chiplunkar, "A survey on techniques for cooperative CPU-GPU computing, Sustainable Computing: Informatics and Systems," vol. 19, 2018, Pages 72-85.
  [72] S. Mittal and J. Vetter. "A Survey of Methods For Analyzing and Improving GPU Energy Efficiency," April 2014ACM Computing Surveys 47(2) DOI: 10.1145/2636342.

Links
- http://pynq.io
- Pynq Workshop