GPU Kernel Programming

by: burt rosenberg
at: university of miami
date: dec 2017
Project 9

This project introduces you to GPU programming on an Kepler Architecture K80. It also 
gives exercise and training on using C and Unix tools to access and exploit advanced
high performance computing infrastructure.

1- Log into sickles. You first must log into rosecrans (as in Project 7) and 
then ssh sickles. 
2- Note that your home directory and all files are there for you. The
CSC Science Cluster shares a common home directory structure.
3- Run [repo]/class/cuda-examples/* and understand what is happening. There are example of Cuda
programs running on the special purpose K80 Tesla GPU that is housed inside sickles.
4- Copy [repo]/class/proj9/* to [repo]/[username]/proj9. Commit with comment (-m) "Initial commit".
5- Modify dot-prod.cu, cum-sum-large.cu and mat-mult.cu.
6- Test and commit by due date.

Notes on project:

Please edit your homedirectory .bash_profile file to add the lines:

PATH=$PATH:$HOME/.local/bin:$HOME/bin
PATH=/usr/local/cuda/bin${PATH:+:${PATH}}

The dot product problem could be answered by combining code from the vector-add and cum-sum programs, suitably modified. For this stage in our explorations, lets assume a single thread block, so that we can use thread synchronization primitives.

The matrix multiplication program should use shared memory. You can assume the matrix has no more than 1024 elements, the number of threads in a block. Use as many blocks are columns in the right matrix, so that the result of the Kronecker product of the right matrix column with the left matrix is left in shared memory. In the second phase add accumulate each row for the result.

The accumulate large will exercise synchronization across blocks. In class we showed how the host by default place kernel launches on the NULL stream, and these are asynchronous for the host but are blocking on the device.

The program binom-ec prices a European Call using the CRR Binomial pricing model. (This fourth program was included because the answer to dot-prod was released in the sample code for vector-xform, so I want to keep you all busy.)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 5 dec 2017
update: 8 dec 2017