久久久久久久va,日韩精品一区二区三

這篇文章主要內(nèi)容是對西刺網(wǎng)站上的免費IP進行爬取和驗證，來保證對其他項目的需求。

目標網(wǎng)址：http://www.xicidaili.com/nn/

通過查看元素，ip地址、端口、類型都可以在一個tr里找到。

目標網(wǎng)站

爬取ip，端口，協(xié)議三個信息，并放入隊列，等待檢查加入數(shù)據(jù)庫。

def get_info(self, q):
    page = 1
    while True:
        url = 'http://www.xicidaili.com/nn/%d' % (page)
        req = requests.get(url, headers=self.header)
        soup = BeautifulSoup(req.text, 'lxml')
        trs = soup.find('table', id='ip_list').find_all('tr')
        for tr in trs[1:]:
            ip = tr.contents[3].text
            port = tr.contents[5].text
            procotol = tr.contents[11].text
            q.put((ip, port, procotol.lower()))
        page += 1

從隊列中取出信息，并用它訪問一個網(wǎng)站，如果成功把它存到數(shù)據(jù)庫中為可用IP。

def check(self, q, lock):
    while True:
        data = q.get()
        try:
            req = requests.get('http://www.baidu.com',
                               proxies={'%s' % (data[2]): '%s://%s:%s'
                                         % (data[2], data[0], data[1])},
                               timeout=2, headers=self.header,
                               cookies=self.cookie)
            if req.status_code == 200:
                print(data)
                tools.i_ip(data)
            else:
                print('not200', data)
        except Exception as e:
            print(e, 'erro', data)
            pass

GitHub開源地址：https://github.com/matianhe/crawler

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Python爬取代理IP，并存入數(shù)據(jù)庫。

Python爬取代理IP，并存入數(shù)據(jù)庫。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Python爬取代理IP，并存入數(shù)據(jù)庫。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Python爬取代理IP，并存入數(shù)據(jù)庫。