實戰(zhàn)計劃第四天,抓了100張照片。
最終成果是這樣的:

Paste_Image.png
我的代碼:
#!/usr/bin/env python #告訴計算機執(zhí)行程序在系統(tǒng)環(huán)境變量中的名字,詳細位置在環(huán)境變量中設(shè)置好了
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
import time
import urllib.request
url = 'http://weheartit.com/inspirations/taylorswift?page=' #網(wǎng)址弄錯了耽誤了效率
proxies = {"HTTP":"121.58.227.252:8080"}
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0'}
def download (url):
wb_data = requests.get(url,headers=headers)
if wb_data.status_code != 200:
return
filename = url.split('/')[4] #split是將字符串分解成小字符串
target = 'E:\PycharmProjects\homework4\imgs\{}.jpg'.format(filename)
with open(target,'wb') as fs:
fs.write(wb_data.content)
print('%s -> %s' % (url,target)) #遍歷 cookies 中的 name 和 value 信息打印#和C中的占位符一致
'''''
def dl_image(url):
urllib.request.urlretrieve(url,path + url.split('/')[2] + url.split('.')[-1])
print('Done')
'''''
def get_img(url,data=None):
wb_data = requests.get(url,headers=headers) #代理和請求頭文件
soup = BeautifulSoup(wb_data.text,'lxml')
imgs = soup.select('#main-container > div > div > div > div > div > a > img') #copy CSS selector
if data == None:
for img in imgs:
data = img.get('src')
print(data)
download(data)
def get_more_pages(start,end):
for one in range(start,end):
get_img(url+str(one))
time.sleep(2)
get_more_pages(1,10)
總結(jié)
- 對網(wǎng)址的處理,很多時候網(wǎng)址選擇錯誤導(dǎo)致報錯
- 代理搞了半天,每一個能用的,老師報錯,用VPN解決掉了
- with as 讀寫文件方法
- split分割
- 異步加載 XRH下檢視器看網(wǎng)頁