接一位微信好友所托叫爬取騰訊視頻的原地址,以此可以去掉煩人的廣告。雖有一些插件可以支持,但代碼擼出來帶勁點吧。。。
步驟不完整,就簡單說些注意點和發(fā)現(xiàn)的東西。
抓包首先從h5,app端抓起,web端花招會多點。。。
直接看幾個接口吧。
1.https://bkvv.video.qq.com/getinfo?_qv_rmt={$u1}&_qv_rmt2={$u2}&defn=auto&platform={$plt}&otype=json&sdtfrom={$std}&_rnd={$ts}&appVer=0.0.1&dtype=3&vid={$vid}&newnettype=1
2.https://h5vv.video.qq.com/getinfo
3.https://h5vv.video.qq.com/getkey
4.https://vv.video.qq.com/getinfo
5.https://vv.video.qq.com/getkey
接口的不同和利用接口里面參數(shù)的不同可以獲得一個視頻的各片段各集的url,或者視頻的M3u8文件,我是利用2,3接口直接獲取一個完整視頻的url。
2接口中g(shù)etinfo的參數(shù)如下
params = {
'charge': 0,
'vid': vid, *url或html獲取
'defaultfmt': 'auto',
'otype': 'json',
'guid': '8fffd19befa1413953bb108f58e49b3b', *發(fā)覺有問題用不了就要換,抓包看
'platform': plt,
'defnpayver': 1,
'appVer': '3.0.83',
'sdtfrom':std,
'host':'v.qq.com',
'ehost':'https%3A%2F%2Fv.qq.com%2Fx%2Fcover%2Fnuijxf6k13t6z9b%2Fl0023olk3g4.html',
'defn':'mp4',
'fhdswitch': 0,
'show1080p':1,
'isHLS':0,
'newplatform':'v1010',
'defsrc':1,
'_0': 'undefined',
'_1': 'undefined',
'_2': 'undefined',
'_': int(round(time.time() * 1000)),
'callback':jsonpCallback, *返回json的前綴
}
r = requests.get('https://h5vv.video.qq.com/getinfo', params=params).content
上面的參數(shù)基本固定也最好不要落下,vid要從自己獲取。這個接口獲取到的信息是為了獲得vkey而作準備。
從2中返回的json獲得
- 視頻url前綴
url_prefix = data['vl']['vi'][0]['ul']['ui'][0]['url'] - MP4文件名字,q0200qbrzbk.mp4這種全集的,q0200qbrzbk.p201.1.mp4這種分段的
fn_pre = data['vl']['vi'][0]['lnk']
filename = fn_pre + '.mp4'
接著請求3接口
參數(shù):
params = {
'charge': 0,
'vid': vid, *視頻vid
'format':2,
'otype': 'json',
'guid': '8fffd19befa1413953bb108f58e49b3b',
'platform': 10901,
'defnpayver': 0,
'appVer': '3.0.83',
'vt':0,
'sdtfrom':'v1010',
'_rnd':rmt['t'], *時間戳重要,沒有直接20k速度
'_qv_rmt': rmt['u1'], *限速算法,重要,沒有直接20k速度
'_qv_rmt2': rmt['u2'], *同上
'ui_host': 2,
'filename':filename,
'callback':jsonpCallback,
'_':int(round(time.time() * 1000)), *13位時間戳,我測沒有會卡頓
}
r = requests.get('https://h5vv.video.qq.com/getkey', params=params).content
核心來了,限速折騰了一天,直到爬各種數(shù)據(jù)拿到j(luò)s的算法整合而成。qvrmt這兩個經(jīng)過算法而生成的參數(shù)騰訊出不久,所以覺得這種爬取視頻方法短時間不會失效。
ok,從以下算法弄出三個參數(shù)扔到上面的接口上去請求。
- qvrmt 請求方法 rmt = getQv(plt, vid, std, str(1)),分別為platform,vid,sdtfrom
# coding: utf-8
import time
import hashlib
Seed = "#$#@#*ad"
urlStr = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="
def hexToString(h):
r = ''
index = 2 if h[0:2] == '0x' else 0
indes = []
indes.append(index)
while(index <len(h)-2):
index+=2
indes.append(index)
for i in indes:
b = int(h[i:i+2],16)
r+=chr(b)
return r
def getQv(plt, vid, std, sts, ts=int(time.time())):
global Seed
ts = str(ts)
p = {
"plt":plt,
"vid":vid,
"std":std,
"sts":sts,
"ts":ts,
}
md = hashlib.md5()
md.update((str(plt) + vid + ts + Seed + sts + std).encode('utf-8'))
result = hexToString(md.hexdigest())
u = urlenc(tempcalc(result, Seed),sts[0],ts)
c = urlenc(tempcalc(result, '86FG@hdf'), sts[0],ts)
u1 = U1(u, 0)
u2 = U1(u, 1)
data = {
'p':p,
'u':u,
'c':c,
'u1':u1,
'u2':u2,
't':ts
}
return data
def urlenc(input,sts,ts):
global urlStr, output
chr1 = chr2 = chr3 = enc1 = enc2 = enc3 = enc4 = ''
chr5 = ''
chr6 = ''
output = ''
i = 0
while(i < len(input)):
chr1 = ord(input[i])
i += 1
m1 = i
i += 1
if(len(input)>m1):
chr2 = ord(input[m1])
else:
chr5='NaN'
m = i
i += 1
if(m>len(input) or m==len(input)):
chr6='NaN'
else:
chr3=ord(input[m])
if(i==15):
output = output+'A'
output = output+sts
output = output+ts
enc1 = chr1 >> 2
enc2 = ((chr1 & 3) << 4) | (chr2 >> 4)
enc3 = ((chr2 & 15) << 2) | (chr3 >> 6)
enc4 = chr3 & 63
if (chr5 == 'NaN'):
enc3 = enc4 = 64
elif(chr6=='NaN'):
enc4 = 64
output = output+urlStr[enc1]+urlStr[enc2]+urlStr[enc3]+urlStr[enc4]
return output
def tempcalc(a,b):
r = ''
for i in range(len(a)):
chr1 = (ord(a[i])^ord(b[i%4]))
r = r+chr(chr1)
return r
def U1(a,b):
r = ''
index = b
indes = []
indes.append(index)
while (index < len(a) - 2):
index += 2
indes.append(index)
for i in indes:
r+= a[i]
i+=2
return r
最后從返回json中獲得重要的vkey拼接視頻地址
url = '{}{}?sdtfrom=v1010&guid=8fffd19befa1413953bb108f58e49b3b&vkey={}'.format(url_prefix,filename,data['key'])
如
[url]= http://222.73.132.155/om.tc.qq.com/ACQ0qWH8FNLvmirYrJFrmcK_-3iFtXdnx2wbFY1zuvv8/p0556o7jft0.p712.1.mp4?sdtfrom=v1010&guid=ffc2f4cc36a272e31b71035fcda35910&vkey=072F6FDB970690CADB8E3EA44A8839FDD911D535BB7D17B3C7AB942B26D59AF7F92F1DE22B1288C9052FDD9D2594195D257F707E7B4063FE62E47C56C90B1C87DB44D16E3502D9F87333807E7E3DD2C62519608B6646CFE2A07519D28FAD923BC2934A7DCDFB51D8D7400910FCBE5FF73A4F6A3056F96209
獲得m3u8文件貌似不會限速,大家研究吧。
后續(xù)轉(zhuǎn)源碼上來,flask部署爬蟲放云上爽歪歪,說得不細,重點是說明限速問題處理,