Project 10: Option Pricing on a GPU

by: burt rosenberg
at: university of miami
date: dec 2017

Write programs to price options on the GPU, using Cox-Ross and the Monte-Carlo on stock price paths. The programs price European calls, either binary or not, with an option up-and-out knock out price. Both programs get their inputs on the command line by a combination of command line options and command line arguments. The Cox-Ross and the Monte-Carlo take the same command line options:

    -R the risk free rate (e.g 5% is 0.05)
    -K the strike price
    -E the expiry of the option
    -S the spot price (current price)
    -N the up and out knockout price (optional)
    -B a flag that when present makes this a binary option
    -G the sigma (e.g. 20% is 0.20)
    -v a verbose flag to be used as you wish for extra output

crr_euro_call arguments

    crr_euro_call [options] block-size block-number
The time step is determined by claim vector size. The claim vector size is block-size * block-number . The actual number of blocks is 2 * block_number - 1, because blocks overlap by half. So the total number of threads is block-size * (2 * block_number - 1).

The algorithm is to create the claim vector xi, then iterate xi gets p * xi + (1-p) * xi+1 for n rounds, where in the first round i goes from 0 to n-2, in the second round, i goes from 0 to n-3, etc, until after n-1 rounds only x0 remains.

Implementing on a GPU, the single block solution gives a thread for each i, and the updates synchronize at two code locations: just before the write of the new value to xi and again just after this write.

To extend to multiple blocks, let B be a block size and N be a multiple of B/2, i.e. N = j * B / 2 for some positive integer j. In each phase we reduce N by B/2, that is, from N to N' where N' = (j-1) * B/2, until j is 2 and then we continue with the single block solution.

Give j is greater than 2, create 2j-1 blocks each of B-1 consecutive indices such as 0 through (B-1), B/2 through (3B/2-1), and so on. These blocks overlap. Iterate as above for each block, stopping after B/2 iterations. Now copy to global memory the results in the first half of each block, respecting the original index placement. This completes a phase. Synchronize by using a for-loop on the host to create a sequence of queued kernels, one for each phase.

mcp_euro_call arguments

    mcp_euro_call [options] steps iterations threads
The time to expiry is divided into steps steps and the work is given in parallel to threads threads. Each thread repeats for iterations iterations, for a total of threads times iterations simulations.

Notes on project:

The templates in [repo]/class/proj10 will help start the project. First copy these to [repo]/[your username]/proj10, svn add and svn commit with message "initial commit". The template files do the command line processing and copy the parameters to device constant memory. The template contains a sample call to a Cuda kernel that takes the command line arguments and a float vector for outputs and (for testing purposes) prints out some values.

Start easy by implementing just some of the functionality. For instance, at first ignore the up and out feature. You can even submit without it for partial credit. Also, you can implement the crr_euro_call first for only one block and submit that for partial credit. After, you can attempt the multi-block solution.

The program [repo]/class/cuda-examples/curand/curand.cu shows how to mathematical functions and the curand pseudorandom number generator.

Difficulty of project!

Do not hesitate to submit just the one block solution for CRR, a simple option and binary option, as well as a simple and binary option for MCP. Partial credit is awarded.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 14 dec 2017
update: 14 dec 2017