Python網(wǎng)絡(luò)爬蟲（一）- 入門基礎(chǔ)

Python網(wǎng)絡(luò)爬蟲（二）- urllib爬蟲案例

Python網(wǎng)絡(luò)爬蟲（三）- 爬蟲進階

Python網(wǎng)絡(luò)爬蟲（四）- XPath

Python網(wǎng)絡(luò)爬蟲（五）- Requests和Beautiful Soup

Python網(wǎng)絡(luò)爬蟲（六）- Scrapy框架

Python網(wǎng)絡(luò)爬蟲（七）- 深度爬蟲CrawlSpider

Python網(wǎng)絡(luò)爬蟲（八） - 利用有道詞典實現(xiàn)一個簡單翻譯程序

1.Scrapy

Scrapy介紹
- 純python開發(fā)實現(xiàn)的一個爬蟲框架
- 包含爬取數(shù)據(jù)、提取結(jié)構(gòu)性數(shù)據(jù)、應(yīng)用框架
- 底層通過Twisted異步網(wǎng)絡(luò)框架處理網(wǎng)絡(luò)通訊
- 可擴展、高性能、多線程、分布式爬蟲框架

scrapy體系結(jié)構(gòu)

Scrapy Engine（引擎組件）：

負責Spider、ItemPipeline、Downloader、Scheduler的工作調(diào)度、信息通訊、數(shù)據(jù)傳遞等工作

Scheduler（調(diào)度組件）：

負責接收引擎?zhèn)鬟f過來的請求，按照具體規(guī)則添加隊列處理，最終返回給引擎

Downloader（下載組件）：

負責下載引擎?zhèn)鬟f過來的所有Request請求，最終服務(wù)器的響應(yīng)數(shù)據(jù)返回給引擎組件

Spider（爬蟲）：

處理所有Response響應(yīng)，分析提取Item數(shù)據(jù)
如果數(shù)據(jù)中有二次請求，繼續(xù)交給引擎組件

ItemPipeline（管道）：

負責[分析、過濾、存儲]處理由Spiders獲取到的Item數(shù)據(jù)

Scrapy Engine(Scrapy核心) 負責數(shù)據(jù)流在各個組件之間的流。Spiders(爬蟲)發(fā)出Requests請求，經(jīng)由Scrapy Engine(Scrapy核心) 交給Scheduler(調(diào)度器)，Downloader(下載器)Scheduler(調(diào)度器) 獲得Requests請求，然后根據(jù)Requests請求，從網(wǎng)絡(luò)下載數(shù)據(jù)。Downloader(下載器)的Responses響應(yīng)再傳遞給Spiders進行分析。根據(jù)需求提取出Items，交給Item Pipeline進行下載。Spiders和Item Pipeline是需要用戶根據(jù)響應(yīng)的需求進行編寫的。除此之外，還有兩個中間件，Downloaders Mddlewares和Spider Middlewares，這兩個中間件為用戶提供方面，通過插入自定義代碼擴展Scrapy的功能，例如去重等。

常用命令

startproject：創(chuàng)建一個新項目
genspider：根據(jù)模板生成一個新爬蟲
crawl：執(zhí)行爬蟲
shell：啟動交互式抓取控制臺

2.安裝和配置

我的系統(tǒng)是 Win7，所以這里只詳細介紹Windows 平臺的安裝，首先，你要有Python，我用的是2.7.7版本和3.5的版本共存。

官網(wǎng)文檔：http://doc.scrapy.org/en/latest/intro/install.html
中文文檔

說點題外話，其實并不是所有的官網(wǎng)文檔都很難看懂，每次進入英文的網(wǎng)站，你覺得很難只是你對英文網(wǎng)站反射性的抵觸而已，慢慢的讀下去，不懂的可以查有道詞典，慢慢的你看到一些全是英文網(wǎng)站會發(fā)現(xiàn)其實沒有想象的那么難了。言歸正傳，我們簡單介紹下ubuntu和mac os下的Scrapy安裝

ubuntu安裝

apt-get install python-dev python-pip libxml12-dev libxstl1-dev 
    zlig1g-dev libssl-dev
pip install scrapy

mac os安裝

官方：建議不要使用自帶的python環(huán)境
安裝：參考官方文檔

1.windows安裝

在命令窗口輸入：

pip install scrapy

安裝完畢之后，輸入 scrapy

顯示如下即安裝成功

同時需要安裝win32py，提供win32api，下載地址：https://sourceforge.net/projects/pywin32/files/

點擊pywin32

點擊最新的

找到適合自己的版本，我用的是python2.7

下載完成以后，這是一個exe文件，直接雙擊安裝就可以了。點擊下一步。

第二步，你會看到你的python安裝目錄，如果沒有檢測到你的python安裝目錄，八成你現(xiàn)在的pywin32版本是不對的，重新下載。點擊下一步

看到這個界面，說明你安裝完成

在python中，引入win32com，測試一下，如果沒有錯誤提示，說明安裝成功

3.安裝過程常見錯誤

如果是這個錯誤，這是pip版本的問題,需要更新pip的版本

在命令窗口輸入：

pip install -U pip
更新成功

如果出現(xiàn)的錯誤是ReadTimeout，則是超時的原因，重新安裝一遍就行。
其他錯誤參考網(wǎng)站：python+scrapy安裝教程，一步步來一遍看到底是哪一步出錯。

4.代碼操作 - 創(chuàng)建一個Scrapy項目

流程：

創(chuàng)建一個Scrapy項目；

定義提取的Item；

編寫爬取網(wǎng)站的 spider 并提取 Item；

編寫 Item Pipeline 來存儲提取到的Item(即數(shù)據(jù))。

1.爬取智聯(lián)招聘相關(guān)python搜索頁數(shù)據(jù)

分析：

（1）分析智聯(lián)招聘網(wǎng)址構(gòu)成；

（2）獲取網(wǎng)頁結(jié)構(gòu)，找出對應(yīng)的Xpath；

（3）寫入html文檔。

分析過程：

通過審查元素找到url訪問的真實地址

真實url的地址

分析網(wǎng)頁中數(shù)據(jù)對應(yīng)的Xpath,

# 當前頁面中所有的崗位描述
//div[@id="newlist_list_div"]//table

# 招聘崗位
//div[@id="newlist_list_div"]//table//td[1]//a

# 反饋概率
//div[@id="newlist_list_div"]//table//td[2]//span

# 發(fā)布公司
//div[@id="newlist_list_div"]//table//td[3]//a/text()

# 崗位月薪
//div[@id="newlist_list_div"]//table//td[4]/text()

創(chuàng)建第一個Scrapy框架第一個項目
- 在命令窗口輸入

scrapy startproject firPro

會創(chuàng)建一個firPro的文件夾，結(jié)構(gòu)如下：

|-- firProl/                        # 項目文件夾
    |-- scrapy.cfg              # 項目發(fā)布配置
    |-- spiders/                    # 項目模塊存儲了實際的爬蟲代碼
        |-- __init__.py         # 模塊描述文件
        |-- items.py                # 定義了待抓取域的模型
        |-- pipelines.py            # 項目pipelines定義文件
        |--settings.py          # 項目全局配置，定義了一些設(shè)置，如用戶代理、爬取延時等。
        |-- spiders/                # 爬蟲模塊<開發(fā)>
            |-- __init__.py     # 模塊描述文件

1.`items.py`中代碼

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class FirproItem(scrapy.Item):
   # define the fields for your item here like:
   # name = scrapy.Field()

   #定義保存崗位的名稱的字段
   name = scrapy.Field()
   #反饋概率
   percent = scrapy.Field()
   #發(fā)布公司
   company = scrapy.Field()
   #崗位月薪
   salary = scrapy.Field()
   #工作地點
   position = scrapy.Field()

2.在spiders創(chuàng)建`fir_spider.py`文件

# -*- coding: utf-8 -*-
import scrapy

#自定義的爬蟲程序處理類，要繼承scrapy模塊的spider類型
class Firspider(scrapy.Spider):
    #定義爬蟲程序的名稱，用于程序的啟動使用
    name = 'firspider'
    #定義爬蟲程序運行的作用域--域名
    allow_domains = 'http://sou.zhaopin.com'
    #定義爬蟲程序真實爬取url地址的列表/原組
    start_urls = ('http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E4%B8%8A%E6%B5%B7&kw=python&sm=0&p=1&source=0',)

    #定義爬蟲獲取到的響應(yīng)數(shù)據(jù)處理類
    #response就是爬取程序獲取的數(shù)據(jù)
    def parse(self,response):
        with open(u'智聯(lián).html','w') as f:
            f.write(response.body)

3.在當前文件夾進入命令窗口

輸入命令運行：

#這里運行的名字是fir_spider.py中定義爬蟲程序的名稱
scrapy crawl firspider

這里爬取到了整個網(wǎng)頁的html,我們可以通過Xpath匹配到我們想要的數(shù)據(jù)

4.保存我們想要的數(shù)據(jù)

# -*- coding: utf-8 -*-
import scrapy
from firPro.items import FirproItem

#自定義的爬蟲程序處理類，要繼承scrapy模塊的spider類型
class Firspider(scrapy.Spider):
    #定義爬蟲程序的名稱，用于程序的啟動使用
    name = 'firspider'
    #定義爬蟲程序運行的作用域--域名
    allow_domains = 'http://sou.zhaopin.com'
    #定義爬蟲程序真實爬取url地址的列表/原組
    start_urls = ('http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E4%B8%8A%E6%B5%B7&kw=python&sm=0&p=1&source=0',)

    #定義爬蟲獲取到的響應(yīng)數(shù)據(jù)處理類
    #response就是爬取程序獲取的數(shù)據(jù)
    # def parse(self,response):
    #     with open(u'智聯(lián).html','w') as f:
    #         f.write(response.body)


    def parse(self, response):
        print (response.body)
        #獲取所匹配的崗位
        job_list= response.xpath('//div[@id="newlist_list_div"]//table')

        #用于存放需要的崗位數(shù)據(jù)
        job_lists = []

        for job in job_list:
            #創(chuàng)建一個Item對象，用于存放匹配的目標數(shù)據(jù)
            item = FirproItem()

            #想要顯示全，就需要extract()方法，轉(zhuǎn)換成字符串輸出
            item["name"] = job.xpath(".//td[1]//a/text()[1]").extract()
            item["percent"] = job.xpath(".//td[2]//span")
            item["company"] = job.xpath(".//td[3]//a/text()")
            item["salary"] = job.xpath(".//td[4]/text()")
            item["position"] = job.xpath(".//td[5]/text()")

            #保存數(shù)據(jù)
            job_lists.append(item)

            #將數(shù)據(jù)提交給模塊pipelines處理
            yield item

同時settings.py中需偽裝請求頭

DEFAULT_REQUEST_HEADERS = {
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language': 'en',
  'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36',
}

#把ITEM_PIPELINES的注釋取消
ITEM_PIPELINES = {
   'firPro.pipelines.FirproPipeline': 300,
}

settings.py介紹
- ROBOTSTXT_OBEY = True：是否遵守robots.txt
- CONCURRENT_REQUESTS = 16：開啟線程數(shù)量，默認16
- AUTOTHROTTLE_START_DELAY = 3：開始下載時限速并延遲時間
- AUTOTHROTTLE_MAX_DELAY = 60：高并發(fā)請求時最大延遲時間
- BOT_NAME：自動生成的內(nèi)容,根名字;
- SPIDER_MODULES：自動生成的內(nèi)容;
- NEWSPIDER_MODULE：自動生成的內(nèi)容；
- ROBOTSTXT_OBEY：自動生成的內(nèi)容,是否遵守robots.txt規(guī)則，這里選擇不遵守；
- ITEM_PIPELINES：定義item的pipeline；
- IMAGES_STORE:圖片存儲的根路徑；
- COOKIES_ENABLED:Cookie使能，這里禁止Cookie;
- DOWNLOAD_DELAY：下載延時，默認為3s。

附：Python yield 使用淺析

這只是簡單的爬蟲,接下來我們保存我們想要的數(shù)據(jù)

items.py

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class FirproItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()

    #定義保存崗位的名稱的字段
    name = scrapy.Field()
    #反饋概率
    percent = scrapy.Field()
    #發(fā)布公司
    company = scrapy.Field()
    #崗位月薪
    salary = scrapy.Field()
    #工作地點
    position = scrapy.Field()

pipelines.py

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

import json

class FirproPipeline(object):
    def __init__(self):
        self.file=open('zhilian.json','w')

    def process_item(self, item, spider):
        text = json.dumps(dict(item),ensure_ascii=False)
        self.file.write(text.encode('utf-8'))
        print '-----------------'

    def close_spider(self,spider):
        self.file.close()

        #return item

fir_spider.py

# -*- coding: utf-8 -*-
import scrapy
from firPro.items import FirproItem
import re

#自定義的爬蟲程序處理類，要繼承scrapy模塊的spider類型
class Firspider(scrapy.Spider):

    #定義正則匹配，把匹配到的數(shù)據(jù)進行替換
    reg = re.compile('\s*')
    #定義爬蟲程序的名稱，用于程序的啟動使用
    name = 'firspider'
    #定義爬蟲程序運行的作用域--域名
    allow_domains = 'http://sou.zhaopin.com'
    #定義爬蟲程序真實爬取url地址的列表/原組
    url = 'http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E4%B8%8A%E6%B5%B7&kw=python&sm=0&source=0&sg=b8e8fb4080fa47afa69cd683dfbfccf9&p='
    p = 1
    start_urls = [url + str(p)]

    def parse(self, response):
        # print (response.body)
        #獲取所匹配的崗位
        job_list= response.xpath('//div[@id="newlist_list_div"]//table')[2:]


        for job in job_list:
            #創(chuàng)建一個Item對象，用于存放匹配的目標數(shù)據(jù)
            item = FirproItem()
            name =job.xpath(".//tr[1]//td[1]//a")


            # name = self.reg.sub('', job.xpath(".//td[1]//a/text()[1]").extract())

            item["name"] = self.reg.sub('',name.xpath("string(.)").extract()[0])
           
            item["percent"] = job.xpath(".//td[2]//span[1]/text()").extract()
            item["company"] = job.xpath(".//td[3]//a/text()").extract()
            item["salary"] = job.xpath(".//td[4]/text()").extract()
            item["position"] = job.xpath(".//td[5]/text()").extract()
            # 將數(shù)據(jù)提交給模塊pipelines處理
            yield item

        if self.p<=10:
            self.p+=1

        yield scrapy.Request(self.url + str(self.p),callback=self.parse)

同時settings.py中需偽裝請求頭

DEFAULT_REQUEST_HEADERS = {
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language': 'en',
  'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36',
}

#把ITEM_PIPELINES的注釋取消
ITEM_PIPELINES = {
   'firPro.pipelines.FirproPipeline': 300,
}

爬取的zhilian.json數(shù)據(jù)

2.爬取中華英才網(wǎng)招聘相關(guān)python搜索頁數(shù)據(jù)

items.py

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class ZhycItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    # 定義需要封裝的字段
    name = scrapy.Field()
    publish = scrapy.Field()
    company = scrapy.Field()
    require = scrapy.Field()
    salary = scrapy.Field()
    desc = scrapy.Field()

pipelines.py

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
import json

class ZhycPipeline(object):
    def __init__(self):
        self.file = open("zhonghuayingcai.json", "w")

    def process_item(self, item, spider):
        text = json.dumps(dict(item), ensure_ascii=False)
        self.file.write(text.encode("utf-8"))
        print "*****************************************"
        #return item

    def close_spider(self, spider):
        self.file.close()

zhycspider.py

# -*- coding: utf-8 -*-
import scrapy
import re
from zhyc.items import ZhycItem

class ZhycspiderSpider(scrapy.Spider):
    reg = re.compile("\s*")
    name = 'zhycspider'
    allowed_domains = ['www.chinahr.com']

    url = "http://www.chinahr.com/sou/?orderField=relate&keyword=python&city=36,400&page="
    page = 1
    start_urls = [url + str(page)]

    def parse(self, response):
        job_list_xpath = response.xpath('//div[@class="jobList"]')

        for jobitem in job_list_xpath:

            item = ZhycItem()

            name = jobitem.xpath(".//li[1]//span[1]//a")
            item["name"] = self.reg.sub("", name.xpath("string(.)").extract()[0])
           
            item["publish"] = self.reg.sub("", jobitem.xpath(".//li[1]//span[2]/text()").extract()[0])

            item["company"] = self.reg.sub("", jobitem.xpath(".//li[1]//span[3]//a/text()").extract()[0])
            item["require"] = self.reg.sub("", jobitem.xpath(".//li[2]//span[1]//text()").extract()[0])
            item["salary"] = self.reg.sub("", jobitem.xpath(".//li[2]//span[2]//text()").extract()[0])
            desc = jobitem.xpath(".//li[2]//span[3]")
            item["desc"] = self.reg.sub("", desc.xpath("string(.)").extract()[0])

            #print name, publish, company, require, salary, desc
            #job_list.append(item)

            yield item
        
        if self.page <= 10:
            self.page += 1
        
        yield scrapy.Request(self.url + str(self.page), callback=self.parse)
        #return job_list

同時settings.py中需偽裝請求頭

DEFAULT_REQUEST_HEADERS = {
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language': 'en',
  'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36',
}

#把ITEM_PIPELINES的注釋取消
ITEM_PIPELINES = {
   'firPro.pipelines.FirproPipeline': 300,
}

爬取數(shù)據(jù)文件zhonghuayingcai.json

{
  "salary": "8000-15000",
  "name": "python測試工程師",
  "company": "Fonrich",
  "publish": "今天",
  "require": "[上海市/閔行]應(yīng)屆生/本科",
  "desc": "電子/半導體/集成電路|民營/私企|51－100人"
}{
  "salary": "7000-10000",
  "name": "風險軟件工程師(Python方向)",
  "company": "中銀消費金融有限公司",
  "publish": "今天",
  "require": "[上海市/黃浦]2年/本科",
  "desc": "證券|民營/私企|101－300人"
}{
  "salary": "8000-15000",
  "name": "Python爬蟲開發(fā)工程師",
  "company": "維賽特財經(jīng)",
  "publish": "今天",
  "require": "[上海市/虹口]1年/大專",
  "desc": "計算機軟件|民營/私企|101－300人"
}{
  "salary": "8000-16000",
  "name": "python爬蟲開發(fā)工程師",
  "company": "上海時來",
  "publish": "今天",
  "require": "[上海市/長寧]應(yīng)屆生/大專",
  "desc": "數(shù)據(jù)服務(wù)|民營/私企|21－50人"
}{
  "salary": "3000-6000",
  "name": "Python講師-上海",
  "company": "伊屋裝飾",
  "publish": "8-11",
  "require": "[上海市/黃浦]2年/大專",
  "desc": "移動互聯(lián)網(wǎng)|民營/私企|20人以下"
}{
  "salary": "6000-8000",
  "name": "python開發(fā)工程師",
  "company": "華住酒店管理有限公司",
  "publish": "7-27",
  "require": "[上海市/閔行]應(yīng)屆生/本科",
  "desc": "酒店|外商獨資|500人以上"
}{
  "salary": "15000-25000",
  "name": "赴日Python工程師",
  "company": "SunWell",
  "publish": "昨天",
  "require": "[海外/海外/]4年/本科",
  "desc": "人才服務(wù)|民營/私企|101－300人"
}
.........
.........

5.Scrapy框架進階 - 深度爬蟲

爬取智聯(lián)python招聘崗位

items.py

# -*- coding: utf-8 -*-
import scrapy

class ZlItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    #崗位名稱
    name = scrapy.Field()
    #反饋率
    percent = scrapy.Field()
    #公司名稱
    company = scrapy.Field()
    #職位月薪
    salary = scrapy.Field()
    #工作地點
    position = scrapy.Field()

pipelines.py

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

import json

class ZlPipeline(object):
    def __init__(self):
        self.file = open("sdzp.json", "w")

    def process_item(self, item, spider):
        text = json.dumps(dict(item), ensure_ascii=False)
        self.file.write(text.encode("utf-8"))
        #return item

    def close_spider(self, spider):
        self.file.close()

zlzp.py

# -*- coding: utf-8 -*-
from scrapy.spiders import CrawlSpider,Rule
from scrapy.linkextractors import LinkExtractor
from zl.items import ZlItem

class ZlzpSpider(CrawlSpider):

    name = 'sdzpspider'
    allowed_domains = ['zhaopin.com']
    start_urls = ['http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%e4%b8%8a%e6%b5%b7&kw=python&sm=0&source=0&sg=936e2219abfb4f07a17009a930d54a37&p=1']

    #定義超鏈接的提取規(guī)則
    page_link = LinkExtractor(allow=('&sg=936e2219abfb4f07a17009a930d54a37&p=\d+'))

    #定義爬蟲爬取數(shù)據(jù)的規(guī)則
    rules=[
        Rule(page_link,callback='parse_content',follow=True)

    ]

    #定義處理函數(shù)
    def parse_content(self, response):
        #獲取整個我們需要的數(shù)據(jù)區(qū)域
        job_list = response.xpath('//div[@id="newlist_list_content_table"]//table//tr[1]')


        for job in job_list:
            #定義一個item,用于存放目標數(shù)據(jù)
            item = ZlItem()
            name = job.xpath(".//td[1]//a")
            if len(name)>0:
                item['name'] = name.xpath('string(.)').extract()[0]


            percent = job.xpath('.//td[2]//span/text()')
            if len(percent)>0:
                item['percent']=percent.extract()[0]

            company = job.xpath(".//td[3]//a[1]/text()")
            if len(company) > 0:
                item["company"] = company.extract()[0]

            salary = job.xpath(".//td[4]/text()")
            if len(salary) > 0:
                item["salary"] = salary.extract()[0]
            position = job.xpath(".//td[5]/text()")
            if len(position) > 0:
                item["position"] = position.extract()[0]

            yield item

爬取結(jié)果顯示：

{}{
  "salary": "15000-25000",
  "position": "上海",
  "company": "Aon Hewitt 怡安翰威特",
  "name": "Senior Web Developer (Python)"
}{}{}{
  "salary": "20001-30000",
  "position": "上海",
  "company": "上海英方軟件股份有限公司",
  "name": "PHP/Python資深研發(fā)工程師"
}{
  "salary": "10000-20000",
  "position": "上海",
  "company": "上海英方軟件股份有限公司",
  "name": "PHP/Python高級研發(fā)工程師："
}{
  "salary": "15000-30000",
  "position": "上海-長寧區(qū)",
  "company": "攜程計算機技術(shù)(上海)有限公司",
  "name": "大數(shù)據(jù)產(chǎn)品開發(fā)"
}{
  "salary": "面議",
  "position": "上海",
  "company": "Michelin China 米其林中國",
  "name": "DevOps Expert"
}{
  "salary": "10001-15000",
  "position": "上海",
  "company": "中興通訊股份有限公司",
  "name": "高級軟件工程師J11015"
}{
  "salary": "10000-20000",
  "position": "上海",
  "company": "上海微創(chuàng)軟件股份有限公司",
  "name": "高級系統(tǒng)運維工程師（赴迪卡儂）"
}{
  "salary": "10000-15000",
  "position": "上海-浦東新區(qū)",
  "company": "北京尚學堂科技有限公司",
  "name": "Python講師（Web方向）"
}{}{
  "salary": "30000-50000",
  "position": "上海",
  "company": "上海復星高科技（集團）有限公司",
  "name": "系統(tǒng)架構(gòu)負責人"
}{
  "salary": "面議",
  "position": "上海-長寧區(qū)",
  "company": "美團點評",
  "name": "前端開發(fā)工程師"
}{
  "salary": "12000-18000",
  "position": "上海",
  "company": "上海微創(chuàng)軟件股份有限公司",
  "name": "Web前端工程師"
}{
  "salary": "10000-13000",
  "position": "上海",
  "company": "上海微創(chuàng)軟件股份有限公司",
  "name": "測試工程師（Test Engineer）（赴諾亞財富）"
}{
  "salary": "10000-20000",
  "position": "上海-浦東新區(qū)",
  "company": "上海洞識信息科技有限公司",
  "name": "高級python研發(fā)人員"
}{
  "salary": "6001-8000",
  "position": "上海-徐匯區(qū)",
  "company": "上海域鳴網(wǎng)絡(luò)科技有限公司",
  "name": "Python軟件開發(fā)"
}{
  "salary": "15000-25000",
  "position": "上海-浦東新區(qū)",
  "company": "中移德電網(wǎng)絡(luò)科技有限公司",
  "percent": "62%",
  "name": "大數(shù)據(jù)架構(gòu)師"
}{
  "salary": "18000-22000",
  "position": "上海-浦東新區(qū)",
  "company": "北京中亦安圖科技股份有限公司",
  "name": "大數(shù)據(jù)開發(fā)工程師"
}
......
......

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Python網(wǎng)絡(luò)爬蟲（六）- Scrapy框架

Python網(wǎng)絡(luò)爬蟲（六）- Scrapy框架

目錄：

1.Scrapy

scrapy體系結(jié)構(gòu)

常用命令

2.安裝和配置

1.windows安裝

3.安裝過程常見錯誤

4.代碼操作 - 創(chuàng)建一個Scrapy項目

流程：

1.爬取智聯(lián)招聘相關(guān)python搜索頁數(shù)據(jù)

1.`items.py`中代碼

2.在spiders創(chuàng)建`fir_spider.py`文件

3.在當前文件夾進入命令窗口

4.保存我們想要的數(shù)據(jù)

這只是簡單的爬蟲,接下來我們保存我們想要的數(shù)據(jù)

2.爬取中華英才網(wǎng)招聘相關(guān)python搜索頁數(shù)據(jù)

5.Scrapy框架進階 - 深度爬蟲

爬取智聯(lián)python招聘崗位

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Python網(wǎng)絡(luò)爬蟲（六）- Scrapy框架

目錄：

1.Scrapy

scrapy體系結(jié)構(gòu)

常用命令

2.安裝和配置

1.windows安裝

3.安裝過程常見錯誤

4.代碼操作 - 創(chuàng)建一個Scrapy項目

流程：

1.爬取智聯(lián)招聘相關(guān)python搜索頁數(shù)據(jù)

1.items.py中代碼

2.在spiders創(chuàng)建fir_spider.py文件

3.在當前文件夾進入命令窗口

4.保存我們想要的數(shù)據(jù)

這只是簡單的爬蟲,接下來我們保存我們想要的數(shù)據(jù)

2.爬取中華英才網(wǎng)招聘相關(guān)python搜索頁數(shù)據(jù)

5.Scrapy框架進階 - 深度爬蟲

爬取智聯(lián)python招聘崗位

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

1.`items.py`中代碼

2.在spiders創(chuàng)建`fir_spider.py`文件