Timeline

Date Attempt Event
09/06/2021 1 £5-10K workstation requested from AHoS (research). Declined - no money.
03/12/2021 2 £13K workstation requested from HoS.
15/12/2021   AHoS (Research) suggests cloud-based solution / budget reduction
11/01/2022 3 £9.8K workstation bid resubmitted to HoS. Budget approved
17/01/2022   Spec/advice meeting with Psy Tech Office. TIS contacted via email
21/01/2022   Request is logged on TIS self-service system
09/02/2022   TIS request quote from getech / lenovo
11/02/2022   No quote forthcoming. Purchase approved by TIS.
11/02/2022   Ordered @ £8.5K from scan.co.uk by Psy Tech Office. Despatch due 14/03/2022.
15/03/2022   Parts shortage, price increase to £9.4K. Despatch now due w/c 21/03/2022.
31/03/2022   Delivered, installed

Specifications

The spec requested to TIS is this machine, which is a 64-thread CPU, 128GB RAM, 2 x RTX3090 for a total of ~20K CUDA cores and 48GB GPU memory. The (retail) cost of components is around £7,500 (see below). They want £8,500, which is quite a mark up, but when we previously looked at Lenovo, they wanted £13K for a less-good system. An even less good MacPro system costs around £15K.

What follows is a justification of the primary spec. The goal here was to get a system with a 3-5 year usable life, that was the best within budget, without getting too far off the price-performance sweet spot.

Choice of GPU

Card CUDA cores Memory (GB) Price (GBP)  
GTX 1060 (isaac) 1280 3 3 180 (Dec 2017)
Quadro P2000 (willslab-ply) 1024 5 330  
RTX 3080 8960 12 900  
RTX 3090 10496 24 1900  

Recent benchmarking indicates that for 2-GPU systems running ResNet152 with a 64 batch size, you can’t even do this at 32-bit precision. At 16-bit precision, RTX3090 is about twice as fast as RTX3080. Training even 5-year-old models, like ResNet152, in reasonable time needs at least 18GB of GPU memory (see below). For this workstation to have a 3-5-year useful life, two graphics cards each with 24GB does not seem overkill.

GPU memory calculations

Training ResNet50 with a batch size of 32 needs 7.5GB of memory. The heuristic they used to work this out is (in bytes) is

( N_weights + N_nodes ) * 4 * batch_size * mask_elements

The 4 comes from 32-bit precision (so 4 bytes per number). mask elements is the number of elements in the convolution mask (typically 3x3 =9); this comes from the limitations of GPUs - we want them to do convolutions but they are inefficient at these, so they’re converted into matrix-matrix multiplications, which are faster but use more memory.

Using this same heuristic, and noting from model.summary() in tensorflow than ResNet152 has 2.3x as many parameters as ResNet50, we get an estimate of 17.25GB to train ResNet152 at a batch size of 32.

Choice of CPU

CPU threads GHz Price (GBP)
Ryzen 5 1600X (isaac) 12 2.6 180 (Dec 2017)
i7-8700 (willslab-ply) 12 3.2 200 (Dec 2017)
Ryzen 9 5900X 24 3.7 490
Ryzen Threadripper 3970X 64 3.7 1900

Our CPU loads are mainly Parameter Space Partioning. Even today, we’re running 96-CPU simulations on HPC systems that take days to run. The current workstations are woefully inadequate (same jobs would take weeks, and memory is inadequate). The last two options are from typical Scan Deep Learning workstations. While there is a cost premium here (2.7x the threads for 3.9x the cost), even the best available within budget is far from overkill.

Cost of components

Component Price (GBP)
RTX3090 x 2 3800
Threadripper 3970X 1900
128 GB RAM 600
2TB SSD 320
4TB HDD 115
Case 150
1200W PSU 200
Motherboard 400