かってきままな日々(06-11[長年日記])

2022-06-11 (Sa)

_ rocm で pytorch

地道にビルドしていったんだけど、pytorch がうまくビルドできなくて諦めた。

コンテナイメージ使お。

docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $HOME/dockerx:/dockerx rocm/pytorch

で、

git clone https://github.com/pytorch/examples.git
cd examples/mnist
pip3 install -r requirements.txt
HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py

root@luna:/var/lib/jenkins/examples/mnist# HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py
/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py:138: UserWarning: An output with one or more elements was resized since it had shape [50176], which does not match the required output shape [64, 1, 28, 28].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at  /var/lib/jenkins/pytorch/aten/src/ATen/native/Resize.cpp:17.)
  return torch.stack(batch, 0, out=out)
Train Epoch: 1 [0/60000 (0%)]	Loss: 2.306316
Train Epoch: 1 [640/60000 (1%)]	Loss: 1.604445
Train Epoch: 1 [1280/60000 (2%)]	Loss: 0.955038
Train Epoch: 1 [1920/60000 (3%)]	Loss: 0.632662
Train Epoch: 1 [2560/60000 (4%)]	Loss: 0.476444
Train Epoch: 1 [3200/60000 (5%)]	Loss: 0.513411
Train Epoch: 1 [3840/60000 (6%)]	Loss: 0.262893

お、うまくいってるいってる。すげー

と思ったんだけど、

/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py:138: UserWarning: An output with one or more elements was resized since it had shape [25088], which does not match the required output shape [32, 1, 28, 28].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at  /var/lib/jenkins/pytorch/aten/src/ATen/native/Resize.cpp:17.)
  return torch.stack(batch, 0, out=out)
/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py:138: UserWarning: An output with one or more elements was resized since it had shape [784000], which does not match the required output shape [1000, 1, 28, 28].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at  /var/lib/jenkins/pytorch/aten/src/ATen/native/Resize.cpp:17.)
  return torch.stack(batch, 0, out=out)
Traceback (most recent call last):
  File "main.py", line 137, in <module>
    main()
  File "main.py", line 129, in main
    test(model, device, test_loader)
  File "main.py", line 61, in test
    output = model(data)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "main.py", line 24, in forward
    x = self.conv2(x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 444, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: HIP out of memory. Tried to allocate 142.00 MiB (GPU 0; 512.00 MiB total capacity; 103.83 MiB already allocated; 258.00 MiB free; 126.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF
root@luna:/var/lib/jenkins/examples/mnist#

GPU メモリ不足らしい。GPU 0 は GPU の ID かな。 512.00 MiB しかないんじゃ無理か 😔

お！ --test-batch-size=100 を付けたらいけた！！テスト精度の計算で時間かかってるけど。

よし、いけそう。

time HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py --test-batch-size=100
time HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py --test-batch-size=100 --no-cuda

↑これで時間を計測。

前者。

Test set: Average loss: 0.0263, Accuracy: 9925/10000 (99%)


real	16m32.497s
user	21m0.749s
sys	0m21.085s

後者。

Test set: Average loss: 0.0252, Accuracy: 9918/10000 (99%)


real	13m43.941s
user	100m27.014s
sys	3m51.267s

がーん。GPU 負けてるやんけww

たぶん、テストセットで精度を計算してるところで時間食ってるからだろうなぁ... 学習中は明らかに GPU の方が速いもんな。

↓rocm-smi の実行結果。

======================= ROCm System Management Interface =======================
================================= Concise Info =================================
ERROR: GPU[0] 		: sclk clock is unsupported
================================================================================
ERROR: 2 GPU[0]:RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment.	
GPU  Temp   AvgPwr  SCLK  MCLK    Fan  Perf  PwrCap       VRAM%  GPU%  
0    63.0c  0.002W  None  800Mhz  0%   auto  Unsupported   67%   89%   
================================================================================
============================= End of ROCm SMI Log ==============================

うんうん、しっかり使ってるな。

luna:~ % sudo journalctl -b |grep ENVY
 6月 11 20:08:42 luna kernel: DMI: HP HP ENVY x360 Convertible 13-ay0xxx/876E, BIOS F.20 07/30/2021
luna:~ %

root@luna:/var/lib/jenkins# rocminfo | grep gfx
  Name:                    gfx90c                             
      Name:                    amdgcn-amd-amdhsa--gfx90c:xnack-   
root@luna:/var/lib/jenkins#

こういうお試し環境がコンテナイメージで提供できるのは、すごくいいなぁ。今更な感想だけど。

地道にインストールしていったものは全部(たぶん)アンインストールした。