在實(shí)現(xiàn)了用session登錄成功后(http://m.itdecent.cn/p/be0e73b52776
),嘗試在Scrapy中如何攜帶cookie來實(shí)現(xiàn)登入
難點(diǎn):
需要全程cookie傳遞,帶著cookie去下載驗(yàn)證碼圖片
參考:
http://m.itdecent.cn/p/72bca2dcac03
https://www.cnblogs.com/think-a-lot/p/9597952.html
思路:
(1)重寫start_url
(2)保存驗(yàn)證碼到本地,傳遞cookie
(3)獲取參數(shù),post方法帶參數(shù)請(qǐng)求實(shí)現(xiàn)登錄
import scrapy
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
}
login_url = 'https://so.gushiwen.org/user/login.aspx?from=http://so.gushiwen.org/user/collect.aspx'
class GushiwenSpider(scrapy.Spider):
name = 'gushiwen'
allowed_domains = ['gushiwen.org']
start_urls = ['https://www.gushiwen.org/']
# login_url='https://so.gushiwen.org/user/login.aspx?from=http://so.gushiwen.org/user/collect.aspx'
# 重寫start_url 方法,獲取驗(yàn)證碼圖片,傳遞cookie
def start_requests(self):
return [scrapy.Request(url='https://so.gushiwen.org/RandCode.ashx',headers=headers,callback=self.get_code,meta={'cookiejar':1})]
#保存驗(yàn)證碼圖片,帶cookie
def get_code(self,response):
with open ('code4.png','wb') as f:
f.write(response.body)
return [scrapy.Request(url=login_url,callback=self.get_VIEWSTATE,meta={'cookiejar':response.meta['cookiejar']})]
# 獲取 VIEWSTATE 參數(shù)
def get_VIEWSTATE(self,response):
VIEWSTATE = response.xpath('//*[@id="__VIEWSTATE"]/@value').extract_first()
print(VIEWSTATE)
code=input('請(qǐng)輸入驗(yàn)證碼:')
formdata = {
'__VIEWSTATE': VIEWSTATE,
'__VIEWSTATEGENERATOR': 'C93BE1AE',
'from': 'http://so.gushiwen.org/user/collect.aspx',
'email': ##,
'pwd': ##,
'code': code,
'denglu': '登錄'
}
# post 方法請(qǐng)求
return scrapy.FormRequest(url=login_url,formdata=formdata,callback=self.login,meta={'cookiejar':response.meta['cookiejar']})
# 驗(yàn)證登錄結(jié)果
def login(self,response):
if response.url== 'https://so.gushiwen.org/user/collect.aspx':
print('登陸成功')
else:
print('登入失敗')
print(response.text)
ps:本人是小白一枚,內(nèi)容可能不太專業(yè),有不嚴(yán)謹(jǐn)?shù)牡胤秸?qǐng)指點(diǎn),謝謝呢~