requests庫登陸的模式自行百度一下"python模擬登陸新版知乎",selenium也有。
準備
- 獲得登陸post數(shù)據(jù)的url
- 獲取登陸所需cookie
- 獲取登陸參數(shù)
1.獲取登陸post數(shù)據(jù)的url很簡單,直接打開調(diào)試模式在登陸頁面輸入錯誤的賬密,sign_in頁面的request_url就是目標地址。
https://www.zhihu.com/api/v3/oauth/sign_in
2.同樣輸入錯誤的賬號密碼在sign_in頁面有一個captcha?lang=en頁面,同樣方法拿到對應url.
https://www.zhihu.com/api/v3/oauth/captcha?lang=en
3.登陸參數(shù),回到sign_in頁面,chrome調(diào)試返回如下:
Request URL:https://www.zhihu.com/api/v3/oauth/sign_in
Request Method:POST
Status Code:401
Remote Address:211.159.244.190:443
Referrer Policy:no-referrer-when-downgrade
access-control-allow-credentials:true
access-control-allow-headers:
access-control-allow-methods:GET,PATCH,PUT,POST,DELETE,OPTIONS
access-control-allow-origin:https://www.zhihu.com
content-encoding:gzip
content-length:115
content-type:application/json; charset=utf-8
date:Sun, 25 Mar 2018 16:00:11 GMT
server:nginx
status:401
vary:Accept-Encoding
www-authenticate:Bearer realm="zhihu"
x-backend-server:zhihu-web.zapi-account.477c51df---10.64.194.2:31015[10.64.194.2:31015]
x-req-id:65962D55AB7C78A
x-req-ssl:proto=TLSv1.2,sni=www.zhihu.com,cipher=ECDHE-RSA-AES256-GCM-SHA384
x-za-experiment:default:None,ge3:ge3_9,ge2:ge2_1,SE_I:c,nwebQAGrowth:experiment,is_office:false,nweb_growth_people:default,info:0,is_show_unicom_free_entry:unicom_free_entry_off,biu:0,app_store_rate_dialog:close,android_profile_panel:panel_b,live_store:ls_a2_b1_c1_f2,nweb_search:nweb_search_heifetz,new_live_feed_mediacard:new,hybrid_zhmore_video:yes,new_mobile_column_appheader:new_header,enable_tts_play:post,rt:y,growth_search:s2,qrcode_login:qrcode,qaweb_related_readings_content_control:close,rows:1,biua:0,android_pass_through_push:all,new_mobile_app_header:true,enable_vote_down_reason_menu:enable,u_re:0,android_db_recommend_action:open,zcm-lighting:zcm,android_db_feed_hash_tag_style:button,mobile_feed_guide:button,is_new_noti_panel:no,wechat_share_modal:wechat_share_modal_show,nweb_search_suggest:default,growth_banner:default
x-za-response-id:7cd5b62dda1da2b9142991f2f07bee0c
:authority:www.zhihu.com
:method:POST
:path:/api/v3/oauth/sign_in
:scheme:https
accept:application/json, text/plain, */*
accept-encoding:gzip, deflate, br
accept-language:zh-CN,zh;q=0.9,en;q=0.8
authorization:oauth c3cef7c66a1843f8b3a9e6a1e3160e20
content-length:1229
content-type:multipart/form-data; boundary=----WebKitFormBoundaryNEOiAmJV7kWT8DkJ
cookie:__DAYU_PP=vIaiqBY2aV7uRIjBNZIJ56ea6e9077b8; _xsrf=7675120f-2bfa-4b1f-a708-41ed0a4cbdf4; q_c1=fc333f7760fe4a38be92609bf166d219|1521907926000|1521907926000; _zap=b531b644-4f98-435b-a879-500803fefaf1; __utma=155987696.1937086906.1521908507.1521908507.1521908507.1; __utmc=155987696; __utmz=155987696.1521908507.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); d_c0="ALCujRVJVg2PTqqMZytO05gVzbHV-etkt7g=|1521908820"; capsion_ticket="2|1:0|10:1521993599|14:capsion_ticket|44:ZTJmNmQ0YzBlYjNmNDY0Y2IyMzBmMmVjZDgyZDhhNmI=|96a1703f8665b4746725006ac4aacba50ad3d8e0c71048011700b8de14e5ec66"
origin:https://www.zhihu.com
referer:https://www.zhihu.com/signup?next=%2F
user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36
x-udid:ALCujRVJVg2PTqqMZytO05gVzbHV-etkt7g=
x-xsrftoken:7675120f-2bfa-4b1f-a708-41ed0a4cbdf4
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="client_id"
c3cef7c66a1843f8b3a9e6a1e3160e20
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="grant_type"
password
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="timestamp"
1521993604811
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="source"
com.zhihu.web
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="signature"
de35ff024d7152d39503fa099180afc2720db403
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="username"
+8613250079979
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="password"
admin123
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="captcha"
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="lang"
en
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="ref_source"
homepage
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="utm_source"
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ--
很大一段,我們只管其中幾個參數(shù):
- Response Headers里面的authorization
2.Request Payload下面的所有參數(shù):
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="client_id"
c3cef7c66a1843f8b3a9e6a1e3160e20
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="grant_type"
password
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="timestamp"
1521993604811
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="source"
com.zhihu.web
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="signature"
de35ff024d7152d39503fa099180afc2720db403
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="username"
+8613250079979
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="password"
admin123
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="captcha"
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="lang"
en
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="ref_source"
homepage
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ
Content-Disposition: form-data; name="utm_source"
------WebKitFormBoundaryNEOiAmJV7kWT8DkJ--
這里換算成鍵值對就是:
{
'client_id':'c3cef7c66a1843f8b3a9e6a1e3160e20'
'grant_type':'password'
......
}
其中,signature比較難找,在chrome里面直接shitf+ctrl+f全局搜索'signature',搜索出來在一個叫
https://static.zhihu.com/heifetz/main.app.327d25e7f280cfb582a1.js 的js里面,ide打開js文件直接搜索signature,生成邏輯如下:
function r(e, t) {
var n = Date.now(), r = new a.a("SHA-1", "TEXT");
return r.setHMACKey("d1b964811afb40118a12068ff74a12f4", "TEXT"), r.update(e), r.update(i.a), r.update("com.zhihu.web"), r.update(String(n)), s({
clientId: i.a, grantType: e, timestamp: n, source: "com.zhihu.web",
signature: r.getHMAC("HEX")
}, t)
}
生成的是HMC的SHA-1值,由grantType,clientId,"com.zhihu.web",timestamp還有‘d1b964811afb40118a12068ff74a12f4’這個字節(jié)生成。
對應的python代碼:
client_id = 'c3cef7c66a1843f8b3a9e6a1e3160e20'
grant_type = 'password'
timestamp = str(round(time.time() * 1000))
source = 'com.zhihu.web'
def get_signature():
hm = hmac.new(b'd1b964811afb40118a12068ff74a12f4', None, hashlib.sha1)
hm.update(grant_type.encode())
hm.update(client_id.encode())
hm.update(timestamp.encode())
hm.update(source.encode())
return hm.hexdigest()
注意,一定是grant_type后client_id后timestamp后source,調(diào)轉(zhuǎn)了輸出的密文是不一樣的。
其他字段固定的,timestamp參照上面代碼。
scrapy:
class ZhihuLoginSpider(scrapy.Spider):
name = 'zhihu_login'
allowed_domains = ['www.zhihu.com']
start_urls = ['http://www.zhihu.com/']
##聲明相應字段
client_id = 'c3cef7c66a1843f8b3a9e6a1e3160e20'
grant_type = 'password'
timestamp = str(round(time.time() * 1000))
source = 'com.zhihu.web'
captcha = ""
lang = 'en'
ref_source = "homepage"
utm_source = ""
#注意header 要加authorization
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36",
"authorization": f"oauth {client_id}"
}
#獲得相應的signature
def get_signature(self):
hm = hmac.new(b'd1b964811afb40118a12068ff74a12f4', None, hashlib.sha1)
hm.update(self.grant_type.encode())
hm.update(self.client_id.encode())
hm.update(self.source.encode())
hm.update(self.timestamp.encode())
return hm.hexdigest()
def parse(self, response):
pass
#重寫start_requests方法,要它先get https://www.zhihu.com/api/v3/oauth/captcha?lang=en拿到capsion_ticket 的cookie,不然沒有這個cookie無法登陸
def start_requests(self):
return [scrapy.Request(url='https://www.zhihu.com/api/v3/oauth/captcha?lang=en', callback=self.call_data,
headers=self.header)]
##正式模擬登陸,post相應字段
def call_data(self, response):
##該字段表明是否要填寫驗證碼,true就是需要填寫,false則不用。
print(json.loads(response.text)["show_captcha"])
post_data = {
'client_id': self.client_id,
'grant_type': self.grant_type,
'timestamp': self.timestamp,
'source': self.source,
'captcha': self.captcha,
'signature': self.get_signature(),
'username': '填寫自己的用戶',
'password': '填寫自己的密碼',
'lang': self.lang,
'ref_source': self.ref_source,
'utm_source': self.utm_source
}
return scrapy.FormRequest(url='https://www.zhihu.com/api/v3/oauth/sign_in', formdata=post_data,
headers=self.header, callback=self.login_callback)
#登陸成功后直接訪問知乎首頁,登陸狀態(tài)下有相應數(shù)據(jù)返回
def login_callback(self, response):
return Request(url='http://www.zhihu.com', headers=self.header, callback=self.login_callback1)
#數(shù)據(jù)返回在這的response
def login_callback1(self, response):
pass
我這里沒有驗證碼的相關邏輯,各位看官自行完善。