deepspeed運(yùn)行大模型時(shí)報(bào)錯(cuò):
```python
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f4892b5a020>
Traceback (most recent call last):
? File "/home/conda/envs/dsp/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
? ? self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f7692a2e020>
Traceback (most recent call last):
? File "/home/conda/envs/dsp/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
? ? self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
```
解決路徑:
1. 在命令行輸入
python -c 'import deepspeed; deepspeed.ops.adam.cpu_adam.CPUAdamBuilder().load()'
進(jìn)行錯(cuò)誤復(fù)現(xiàn),查看是否會(huì)報(bào)錯(cuò),如果是torch和cuda版本不匹配,會(huì)報(bào)錯(cuò):
deepspeed.ops.op_builder.CUDAMismatchException: xxxx
2. 解決方案(兩種)
? ? a. 在執(zhí)行代碼前加:DS_SKIP_CUDA_CHCK=1
? ? b. 直接進(jìn)去錯(cuò)誤源碼中,改為不校驗(yàn)torch,cuda版本匹配問(wèn)題