4g看看永久免费成人,蜜桃亚洲精品视频

通過三種方式抓取字段:

招聘標(biāo)題待遇地區(qū) 學(xué)歷要求經(jīng)驗公司名稱公司的行業(yè) 職位描述
效果預(yù)覽

爬取

image

image
注意事項
- 利用xpath或者其它方式選取節(jié)點時，需要注意判斷是否為None,如果在后面調(diào)用.strip()等方法肯定會報錯，所以建議提取出一個共用的判斷方法
- url拼接問題，當(dāng)大部分詳情頁鏈接都有schema時，突然返回你一個沒有https://等的鏈接，此時爬取肯定出錯，所以建議使用parse.urljoin('https://www.liepin.com',url)方法拼接
- 建議使用bs4爬取時，選取select方法，能提高編程效率和避免一些由于class或其它屬性有多個值的情況

上代碼

首先定義一個Spider類，其主要作用就是作為其它三種方式的父類，提取出共有的行為，學(xué)過Java的應(yīng)該很好理解，它們都有請求數(shù)據(jù)，解析數(shù)據(jù)，請求工作詳情數(shù)據(jù)，解析工作詳情數(shù)據(jù)，解析數(shù)據(jù)的話，得讓子類實現(xiàn)，所以此類需要設(shè)置為抽象類

  class Spider():
      __metaclass__ = abc.ABCMeta
      
      def __init__(self):
      self.row_title = ['標(biāo)題','待遇','地區(qū)','學(xué)歷要求','經(jīng)驗','公司名稱','所屬行業(yè)','職位描述']
      sheet_name = "獵聘網(wǎng)"
      self.execl_f, self.sheet_info = ExeclUtils.create_execl(sheet_name,self.row_title)
      # add element in one data
      self.job_data = []
      # the data added start with 1
      self.count = 0
      
      def crawler_data(self):
          '''
          crawler data
          '''
          for i in range(0,5):
              url = 'https://www.liepin.com/zhaopin/?industryType=&jobKind=&sortFlag=15&degradeFlag=0&industries=&salary=&compscale=&key=Python&clean_condition=&headckid=4a4adb68b22970bd&d_pageSize=40&siTag=p_XzVCa5J0EfySMbVjghcw~fA9rXquZc5IkJpXC-Ycixw&d_headId=62ac45351cdd7a103ac7d50e1142b2a0&d_ckId=62ac45351cdd7a103ac7d50e1142b2a0&d_sfrom=search_fp&d_curPage=0&curPage={}'.format(i)
              self.request_job_list(url)
              time.sleep(2)   
      
      def request_job_list(self,url):
          '''
          get the job data by request url
          '''
          try:
              headers = {
                  'Referer':'https://www.liepin.com/',
                  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36'
                  }
              reponse = requests.get(url,headers = headers)
              # utf-8
              if reponse.status_code != 200:
                  return
              self.parse_job_list(reponse.text)
          except Exception as e:
              # raise e
              print('request_job_list error : {}'.format(e))
      
      @abc.abstractmethod
      def parse_job_list(self,text):
          '''
          parsing the data from the response
          '''
          pass
      
      def request_job_details(self,url):
          '''
          request thr job detail's url
          '''
          try:
              headers = {
              'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36'
              }
              response = requests.get(url,headers = headers);
              # utf-8
              if response.status_code != 200:
                  return
              self.parse_job_details(response.text)
          except Exception as e:
              # raise e
              print('request_job_details error : {}'.format(e))
      
      @abc.abstractmethod
      def parse_job_details(self,text):
          '''
          parsing the job details from text
          '''
          pass
      .......

剩下的xpath、re、bs爬取，它們的類只需要繼承該類，實現(xiàn)其抽象方法
由于上代碼篇幅太長，直接上傳送門
thank you for reading

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Python 3.6 優(yōu)雅的爬取獵聘網(wǎng)招聘信息

Python 3.6 優(yōu)雅的爬取獵聘網(wǎng)招聘信息

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Python 3.6 優(yōu)雅的爬取獵聘網(wǎng)招聘信息

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av