工作需要,在已經(jīng)預(yù)裝了Windows10的工作站,需要再安裝ubuntu。因?yàn)楣ぷ髡颈旧碛袃蓧K硬盤,所以準(zhǔn)備空出一個(gè)裝ubuntu,這樣兩個(gè)系統(tǒng)互不干擾,不使用對方的硬盤空間。工作站裝里有兩塊Nvidia 1080TI,導(dǎo)致后續(xù)安裝ubuntu有一些需要注意的問題,下文詳述。
Ubuntu 18.04 下深度學(xué)習(xí)環(huán)境搭建
借了一個(gè)燒錄好的ubuntu16的u盤,可惜的是安裝中出現(xiàn)了各種問題,進(jìn)入不了安裝界面,找不到硬盤等等。
于是重新下了18.04 desktop lts 的鏡像,官網(wǎng)list里找了以下這個(gè)國內(nèi)的源,上海交大的,下載速度還可以。工作電腦沒有中文輸入,下文部分英文,以后有時(shí)間再翻譯一下。
--------------------------------------------------------------------------------------------------------------------------
UEFI introduced Install Ubuntu 18.04 LTS desktop
Step1: download 18.04 lts desktop image from http://ftp.sjtu.edu.cn/ubuntu-cd/18.04/
Step2: download UltraISO trail version and burn your image to an fresh USB
Step3: Turnoff your secure boot and fast startup options in BIOS and control panel respectively
* Step4: Reboot and use F12 to go into one-time boot options
[Trick] for Nvidia graphic card]Step5: Select second option *Install ubuntu,? press e, modify apci =off and press F10 to go into install
[Trick] If your screen stuck at
/dev/sda1 contains a file system with errors, check forced.
(initramfs)_
Input command: fsck /dev/sda1 then enter y when prompted to perform fixes, then input reboot if it doesn't do so automatically
Assuming everything works well you would have ubuntu 18.04 on your introduction screen when boot, if it doesn't meaning your UEFI file is not working, you need to download easyUEFI to repair, don't download BCD because it is not free in commercial environment.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Ubuntu 18.04 setting up for Deep learning environment
Important Note, when setup the environment you need to know exactly which system, graphic card, CUDA, CUDNN you will install because everything is dependent. Don't start installing without figuring out which system and CUDA and CUDNN you need, you may have to go back and forth if you do not have a plan, it is painful!
Reference https://medium.com/@zhanwenchen/install-cuda-and-cudnn-for-tensorflow-gpu-on-ubuntu-79306e4ac04e
Prerequisite:
nvidia-smi to see if you have install NVIDIA drivers in your software center, take a look at driver version, if your already got results this could be good or bad. If you want to install CUDA 10 but your driver version is older than 400, unfortunately you have to remove all the driver and download the new driver and reinstall!
Step1. Figure out which graphic card you have, for me, Geforce GTX 1080TI, go to the following website to get your driver, this is essential for success for the following install!!!! Don't just use a random blog's command to install random driver, it is much easier to use apt-get but the version might be wrong for your graphic card or system or CUDA!!!!
?Legacy drivers:?https://www.nvidia.com/Download/Find.aspx
Latest driver:?https://www.nvidia.com/Download/index.aspx?lang=en-us (100+M)

*****If you have unfortunately installed the wrong driver, here is the post to help you reinstall**********
Step A.? Remove nvidia driver by following command
$ sudo apt-get purge nvidia*
$ sudo apt-get autoremove
Step B. Reboot to go to the secure mode, without opening X, because X is also using nvidia thus when you try to install driver, it will say some nvidia stuff is loaded and could not install
In secure mode select root shell, in the root shell install your downloaded driver

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
*******************************************************************
Step 2, Download CUDA and install, for my ubuntu18.04, CUDA 10.1(注意至今2019/4/7最新版的tensorflow仍然只支持CUDA10.0,所以如果你是tensorflow 用戶請使用CUDA10.0,并且你的driver version不要是最新的,得是如上圖所示如果是pytorch用戶CUDA10.1我試過是可以的)
https://developer.nvidia.com/cuda-downloads
download your .run file, cd to the downloaded folder, do
sudo chmod +x cuda_10.1_linux.run
./cuda_9.0.176_384.81_linux.run --override
sudo apt-get install nvidia-384 nvidia-modprobe
Step 2 install CUDNN
Download CUDNN from download page: https://developer.nvidia.com/rdp/cudnn-download
In my case, I need CUDNN 7.5 which is made for CUDA 10.1
Installation guide can be found in the following link, ignore the last step, just copy files to corresponding folders will be fine.
https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html
*[Trick] When testing CUDNN, Error may occur:
CUDA driver version is insufficient for CUDA runtime version, congratulations, this means your system/ graphic card/ CUDA driver/ CUDA/CUDNN must have some version inconsistent.
I would give you some encouragement by saying this, let's have a look at the the top of this post and install the environment again, this is also what I did and my motivation to record the process in this post.
sudo apt install python3-pip
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Optional install:
Sougou pinyin
Reference: http://ubuntuhandbook.org/index.php/2016/07/2-best-chinese-pinyin-im-ubuntu-16-04/
Issue: Couldn't open sougou web page so can not download .deb file, Stuck, need to get it from my PC
Step1 : In terminal, type command
$sudo apt remove fcitx* && sudo apt autoremove
Windows10 下深度學(xué)習(xí)環(huán)境搭建
與在unbuntu下搭建深度學(xué)習(xí)環(huán)境(tensorflow)一樣,由于tensorflow各個(gè)版本所需的CUDA driver +CUDA +CUDNN版本有特殊需求(這里不得不吐槽一下這個(gè)tensorflow版本控制做的稀爛,各種向上向下不兼容,很多人都是裝到最后一步,測試tensorflow代碼的時(shí)候,發(fā)現(xiàn)運(yùn)行不了tensorflow, 要么找不到CUDA.xx.dll,要么specific module不能import。)在windows10里,我測試了兩種安裝方法,docker安裝和一步步自己安裝。
從底層開始一步一步安裝
安裝CUDA
如果要自己一步步裝tensorflow,推薦的是裝CUDA 9.0, CUDA10.0這種大版本,但是也不能保證一定不出問題。為了記錄下遇到各種問題的解決方法,我特意裝了CUDA 9.2,下文詳述如何安裝。所需的文件我會(huì)共享到百度網(wǎng)盤里去,方便大家下載(待更新)。如果想自己下載可以看上文linux安裝里,給出了driver,cuda,cudnn的下載鏈接,不過你需要自己找到版本。
1. CUDA Driver? (398.82-desktop-win10-64bit-international-whql.exe)
要安裝CUDA 9.2,需要安裝對應(yīng)的CUDA driver,見上文中的TABLE 1,由于我的CUDA 9.2下的是148版本,在Windows下需要398.82的driver版本。安裝是否成功可以在命令行里用nvidia-smi命令確定,如果找不到這個(gè)命令,到你安裝CUDA driver的文件夾里去找到這個(gè)exe程序添加到系統(tǒng)path變量里去。

如果看到你的顯卡則安裝成功,比如我有倆1080TI,則輸出如下

2. CUDA (cuda_9.2.148_win10_network.exe)
裝CUDA 之前要安裝Visual Studio 2015,這個(gè)版本比較保險(xiǎn),如果你想使用Visual Studio 2017, 在安裝CUDA 9.2時(shí),自定義安裝里不要選擇與Visual Studio相關(guān)的子選項(xiàng),不然你的CUDA會(huì)安裝失?。ㄎ覐木W(wǎng)上查了好幾,說這個(gè)已經(jīng)broken for ages,所以其實(shí)我也不確定2015的就可以勾選。測試CUDA有沒有安裝完成,可以在CMD里使用nvcc -V,可以查看你安裝的CUDA版本。

3 CUDNN(cudnn-9.2-windows10-x64-v7.5.1.10)
把文件夾里的各個(gè)文件copy到對應(yīng)的cuda文件夾里,并把幾個(gè)folder的路徑也加入到path中去,你的cudnn安裝就完成了(見上圖)。
安裝Tensorflow-gpu
我嘗試用pip 安裝了tensorflow-gpu的各個(gè)官方版本,1.10 -1.18裝了個(gè)遍,然后在CMD里用?python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))" 測試安裝成功與否,出現(xiàn)了找不到cuda9.0.dll,can't import module 等各種錯(cuò)誤,主要原因就是CUDA不是大版本比如9.0或者10.0這樣的,各種版本不兼容。這種情況下可能需要build from source,但是步驟很麻煩。所以給出一個(gè)別人build好tensorflow各種版本wheel的網(wǎng)址,大家可以根據(jù)自己的cuda版本和python版本及所需要的tensorflow版本自由選擇。
https://github.com/fo40225/tensorflow-windows-wheel
由于我是CUDA9.2, python3.6.6版本的,希望使用的是tensorflow任一GPU版本,所以選擇了
tensorflow_gpu-1.9.0-cp36-cp36m-win_amd64.whl 下載到本地,pip安裝這個(gè)wheel后測試tensorflow成功。
一個(gè)容器一鍋端
從說明來看,其實(shí)tensorflow-gpu不能在Windows系統(tǒng)上用docker,因?yàn)橐獑?dòng)NVIDIA GPU的docker容器,需要安裝nvidia-docker,然后nvidia-docker目前僅適用于Linux,但是就我來說,我還是下載了Windows下的tensorflow-gpu待jupyter notebook的docker,并且測試成功了。我不確定是什么情況,是不是效率會(huì)低,因?yàn)檫€沒有真正訓(xùn)練過一個(gè)模型,那么目前先寫到這里,全測試完成后再來更新。