多任務(wù)的概念

什么叫“多任務(wù)”呢？簡(jiǎn)單地說(shuō)，就是操作系統(tǒng)可以同時(shí)運(yùn)行多個(gè)任務(wù)。打個(gè)比方，你一邊看電影，一邊聊QQ，一邊在用Word趕作業(yè)，這就是多任務(wù)，這時(shí)至少同時(shí)有3個(gè)任務(wù)正在運(yùn)行。
單核CPU如何執(zhí)行多任務(wù)？多核CPU如何執(zhí)行多任務(wù)？
真正的并行執(zhí)行多任務(wù)只能在多核CPU上實(shí)現(xiàn)，但是，由于任務(wù)數(shù)量遠(yuǎn)遠(yuǎn)多于CPU的核心數(shù)量，所以，操作系統(tǒng)也會(huì)自動(dòng)把很多任務(wù)輪流調(diào)度到每個(gè)核心上執(zhí)行。

并發(fā)：在一個(gè)核心中多個(gè)人物交替執(zhí)行，任務(wù)是同時(shí)發(fā)起，但并不是同時(shí)執(zhí)行
并行：任務(wù)數(shù)量小于或等于核心數(shù)量，這個(gè)時(shí)候每個(gè)核心都在執(zhí)行任務(wù)任務(wù)是同時(shí)執(zhí)行的

threading.Thread參數(shù)介紹：

target:線程執(zhí)行的函數(shù)
name:線程名稱
args:執(zhí)行函數(shù)中需要傳遞的參數(shù)，元組類型另外：注意daemon參數(shù)
如果某個(gè)子線程的daemon屬性為False，主線程結(jié)束時(shí)會(huì)檢測(cè)該子線程是否結(jié)束，如果該子線程還在運(yùn)行，則主線程會(huì)等待它完成后再退出；

如果某個(gè)子線程的daemon屬性為T(mén)rue，主線程運(yùn)行結(jié)束時(shí)不對(duì)這個(gè)子線程進(jìn)行檢查而直接退出，同時(shí)所有daemon值為T(mén)rue的子線程將隨主線程一起結(jié)束，而不論是否運(yùn)行完成。

屬性daemon的值默認(rèn)為False，如果需要修改，必須在調(diào)用start()方法啟動(dòng)線程之前進(jìn)行設(shè)置

互斥鎖

當(dāng)多個(gè)線程幾乎同時(shí)修改某一個(gè)共享數(shù)據(jù)的時(shí)候，需要進(jìn)行同步控制
線程同步能夠保證多個(gè)線程安全訪問(wèn)競(jìng)爭(zhēng)資源，最簡(jiǎn)單的同步機(jī)制是引入互斥鎖。
互斥鎖為資源引入一個(gè)狀態(tài)：鎖定/非鎖定
某個(gè)線程要更改共享數(shù)據(jù)時(shí)，先將其鎖定，此時(shí)資源的狀態(tài)為“鎖定”，其他線程不能更改；直到該線程釋放資源，將資源的狀態(tài)變成“非鎖定”，其他的線程才能再次鎖定該資源。互斥鎖保證了每次只有一個(gè)線程進(jìn)行寫(xiě)入操作，從而保證了多線程情況下數(shù)據(jù)的正確性。

鎖的好處：確保了某段關(guān)鍵代碼只能由一個(gè)線程從頭到尾完整地執(zhí)行
鎖的壞處：1.阻止了多線程并發(fā)執(zhí)行，包含鎖的某段代碼實(shí)際上只能以單線程模式執(zhí)行，效率就大大地下降了
2.由于可以存在多個(gè)鎖，不同的線程持有不同的鎖，并試圖獲取對(duì)方持有的鎖時(shí)，可能會(huì)造成死鎖

多線程爬蟲(chóng)

Queue（隊(duì)列對(duì)象）

Queue是python中的標(biāo)準(zhǔn)庫(kù)，可以直接import Queue引用;
隊(duì)列是線程間最常用的交換數(shù)據(jù)的形式

python下多線程的思考

對(duì)于資源，加鎖是個(gè)重要的環(huán)節(jié)。因?yàn)閜ython原生的list,dict等，都是not thread safe的。而Queue，是線程安全的，因此在滿足使用條件下，建議使用隊(duì)列
1.初始化： class (FIFO 先進(jìn)先出)

Queue.Queue(maxsize)  
maxsize(隊(duì)列的長(zhǎng)度)

2.包中的常用方法:

Queue.qsize() 返回隊(duì)列的大小
Queue.empty() 如果隊(duì)列為空，返回True,反之False
Queue.full() 如果隊(duì)列滿了，返回True,反之False
Queue.full 與 maxsize 大小對(duì)應(yīng)
Queue.get([block[, timeout]])獲取隊(duì)列，timeout等待時(shí)間

3.創(chuàng)建一個(gè)“隊(duì)列”對(duì)象

import Queue
myqueue = Queue.Queue(maxsize = 10)

4.將一個(gè)值放入隊(duì)列中

myqueue.put(10)

5.將一個(gè)值從隊(duì)列中取出

myqueue.get()

QQ截圖20181202225351.png

實(shí)例演示

import requests
import threading
from lxml import etree
import queue

class crawlThread(threading.Thread):
def __init__(self,threadName,page_queue,data_queue):
super(crawlThread,self).__init__()
self.threadName = threadName
self.page_queue = page_queue
self.data_queue = data_queue
self.headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Firefox/60.0',}

def run(self):
# 這里從page_queue獲取對(duì)應(yīng)的頁(yè)碼
while not self.page_queue.empty():
#get()從隊(duì)列中取值，先進(jìn)先出
page = self.page_queue.get()
print(page)
full_url = 'http://blog.jobbole.com/all-posts/page/'+str(page)+'/'
response = requests.get(full_url,headers=self.headers)
response.encoding = 'utf-8'
if response.status_code == 200:
#將獲取到的結(jié)果，存放在data_queue隊(duì)列中
self.data_queue.put(response.text)
# #線程的采集任務(wù)
# def crawl_data(page_queue,data_queue):
# header = {
# 'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Firefox/60.0',
# }
# # 這里從page_queue獲取對(duì)應(yīng)的頁(yè)碼
# while not page_queue.empty():
# #get()從隊(duì)列中取值，先進(jìn)先出
# page = page_queue.get()
# print(page)
# full_url = 'http://blog.jobbole.com/all-posts/page/'+str(page)+'/'
# response = requests.get(full_url,headers=header)
# response.encoding = 'utf-8'
# if response.status_code == 200:
# #將獲取到的結(jié)果，存放在data_queue隊(duì)列中
# data_queue.put(response.text)

# def parse_data(data_queue):
# #不為空的時(shí)候去取值，為空說(shuō)明沒(méi)有解析任務(wù)了
# while not data_queue.empty():
# html = etree.HTML(data_queue.get())
# articles = html.xpath('//div[@class="post floated-thumb"]')
# for item in articles:
# title = item.xpath('.//a[@class="archive-title"]/text()')[0]
# print(title)

class parseThread(threading.Thread):
def __init__(self,threadName,data_queue,lock):
super(parseThread,self).__init__()
self.threadName = threadName
self.data_queue = data_queue
self.lock = lock

def run(self):
#不為空的時(shí)候去取值，為空說(shuō)明沒(méi)有解析任務(wù)了
while not self.data_queue.empty():
html = etree.HTML(self.data_queue.get())
articles = html.xpath('//div[@class="post floated-thumb"]')
for item in articles:
title = item.xpath('.//a[@class="archive-title"]/text()')[0]
print(title)
#加鎖
self.lock.acquire()
with open('jobbole.txt','a') as f:
f.write(title+'\n')
#解鎖
self.lock.release()

def spider():
#創(chuàng)建一個(gè)任務(wù)隊(duì)列：里面的參數(shù)maxsize表示最大的存儲(chǔ)量
page_queue = queue.Queue(40)
#http://blog.jobbole.com/all-posts/page/2/ (2表示頁(yè)碼)
for i in range(1,30):
page_queue.put(i)

#將解析后的數(shù)據(jù)放在這個(gè)隊(duì)列中，供后后面的解析線程去做解析
data_queue = queue.Queue()

#創(chuàng)建線程取下載任務(wù)
lock = threading.Lock()
crawlThreadName = ['crawl1號(hào)','crawl2號(hào)','crawl3號(hào)','crawl4號(hào)']
thread_list = []
for threadName in crawlThreadName:
# thread = threading.Thread(target=crawl_data,name=threadName,args=(page_queue,data_queue))
thread = crawlThread(threadName,page_queue,data_queue)
thread.start()
thread_list.append(thread)
# thread.join() 不能直接寫(xiě)在這里

for thread in thread_list:
thread.join()

#創(chuàng)建解析線程：
parseThreadName = ['parse1號(hào)','parse2號(hào)','parse3號(hào)','parse4號(hào)']
parseThread_list = []
for threadName in parseThreadName:
# thread = threading.Thread(target=parse_data,name=threadName,args=(data_queue,))
thread = parseThread(threadName,data_queue,lock)
thread.start()
parseThread_list.append(thread)
for thread in parseThread_list:
thread.join()
#打印當(dāng)前線程的名稱
print(threading.current_thread().name)

if __name__ == '__main__':
spider()

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

多線程與多線程爬蟲(chóng)

多線程與多線程爬蟲(chóng)

多任務(wù)的概念

threading.Thread參數(shù)介紹：

互斥鎖

多線程爬蟲(chóng)

Queue（隊(duì)列對(duì)象）

python下多線程的思考

實(shí)例演示

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

多線程與多線程爬蟲(chóng)

多任務(wù)的概念

threading.Thread參數(shù)介紹：

互斥鎖

多線程爬蟲(chóng)

Queue（隊(duì)列對(duì)象）

python下多線程的思考

實(shí)例演示

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av