|09/06/2021||1||£5-10K workstation requested from AHoS (research). Declined - no money.|
|03/12/2021||2||£13K workstation requested from HoS.|
|15/12/2021||AHoS (Research) suggests cloud-based solution / budget reduction|
|11/01/2022||3||£9.8K workstation bid resubmitted to HoS. Budget approved|
|17/01/2022||Spec/advice meeting with Psy Tech Office. TIS contacted via email|
|21/01/2022||Request is logged on TIS self-service system|
|09/02/2022||TIS request quote from getech / lenovo|
|11/02/2022||No quote forthcoming. Purchase approved by TIS.|
|11/02/2022||Ordered @ £8.5K from scan.co.uk by Psy Tech Office. Despatch due 14/03/2022.|
|15/03/2022||Parts shortage, price increase to £9.4K. Despatch now due w/c 21/03/2022.|
The spec requested to TIS is this machine, which is a 64-thread CPU, 128GB RAM, 2 x RTX3090 for a total of ~20K CUDA cores and 48GB GPU memory. The (retail) cost of components is around £7,500 (see below). They want £8,500, which is quite a mark up, but when we previously looked at Lenovo, they wanted £13K for a less-good system. An even less good MacPro system costs around £15K.
What follows is a justification of the primary spec. The goal here was to get a system with a 3-5 year usable life, that was the best within budget, without getting too far off the price-performance sweet spot.
Choice of GPU
|Card||CUDA cores||Memory (GB)||Price (GBP)|
|GTX 1060 (isaac)||1280||3||3||180 (Dec 2017)|
|Quadro P2000 (willslab-ply)||1024||5||330|
Recent benchmarking indicates that for 2-GPU systems running ResNet152 with a 64 batch size, you can’t even do this at 32-bit precision. At 16-bit precision, RTX3090 is about twice as fast as RTX3080. Training even 5-year-old models, like ResNet152, in reasonable time needs at least 18GB of GPU memory (see below). For this workstation to have a 3-5-year useful life, two graphics cards each with 24GB does not seem overkill.
GPU memory calculations
Training ResNet50 with a batch size of 32 needs 7.5GB of memory. The heuristic they used to work this out is (in bytes) is
( N_weights + N_nodes ) * 4 * batch_size * mask_elements
4 comes from 32-bit precision (so 4 bytes per number).
mask elements is the number of elements in the convolution mask (typically 3x3 =9); this comes from the limitations of GPUs - we want them to do convolutions but they are inefficient at these, so they’re converted into matrix-matrix multiplications, which are faster but use more memory.
Using this same heuristic, and noting from
model.summary() in tensorflow than ResNet152 has 2.3x as many parameters as ResNet50, we get an estimate of 17.25GB to train ResNet152 at a batch size of 32.
Choice of CPU
|Ryzen 5 1600X (isaac)||12||2.6||180 (Dec 2017)|
|i7-8700 (willslab-ply)||12||3.2||200 (Dec 2017)|
|Ryzen 9 5900X||24||3.7||490|
|Ryzen Threadripper 3970X||64||3.7||1900|
Our CPU loads are mainly Parameter Space Partioning. Even today, we’re running 96-CPU simulations on HPC systems that take days to run. The current workstations are woefully inadequate (same jobs would take weeks, and memory is inadequate). The last two options are from typical Scan Deep Learning workstations. While there is a cost premium here (2.7x the threads for 3.9x the cost), even the best available within budget is far from overkill.
Cost of components
|RTX3090 x 2||3800|
|128 GB RAM||600|