English 中文(简体)
报废的溶解反应
原标题:Abnormal response while scraping sofifa.com

I m试图用报废工具报废。 下面的法典规定,Im试图取消60名参与者的全名和评级,但我只剩下60多页,而且,除非我停止这样做,否则就停止了。

我注意到,许多被报废的参与者在第一页中并不存在,它还试图废除我提供的团队数据。

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import time
from lxml import etree
# from scrapy_cloudflare_middleware.middlewares import CloudFlareMiddleware


class PlayersSpider(CrawlSpider):
    name = "players"
    allowed_domains = ["sofifa.com"]
    # start_urls = [ https://sofifa.com ]
    
    user_agent =  Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 

    def start_requests(self):
        yield scrapy.Request(url=  https://sofifa.com , headers= { User-Agent : self.user_agent})

    rules = (
        Rule(LinkExtractor(restrict_xpaths= ( //table//tbody//tr/td[2]/a )[:60]), callback="parse_item", follow=True),
        )

    # def set_user_agent(self, request, ay7aga):
    #     request.headers[ User-Agent ] = self.user_agent
    #     return request

    def parse_item(self, response):
        time.sleep(1)
        # print(response.status)
        if  /player  in response.url:
            yield {
                 full_name : response.xpath( //div[@class="profile clearfix"]/h1/text() ).get(),
                 overall_rating : response.xpath( //div[@class="grid"]//em[1]/text() ).get()
                #  potential : response.xpath( .//div[@class="grid"]//em[2]/text() ).get(),
                #  value : response.xpath( .//div[@class="grid"]//em[3]/text() ).get(),
                #  wage : response.xpath( .//div[@class="grid"]//em[4]/text() ).get()
            }
        else:
            pass
问题回答

您正在使用<条形码>,旨在通过下列链接撤销网站。 您的模块是无限的,因为它将使用不同的ur,点击各种链接,并获取更多的内容。 如果你只希望停留在一页上,你就应当使用更简单的<代码>Spider方法。

这里有一个经过更新的雕像模块,它只从头版中删除所有名字。 我使用报废。 补贴而不是报废。 拖网:

import scrapy
import time

class PlayersSpider(scrapy.Spider):
    name = "players"
    allowed_domains = ["sofifa.com"]
    start_urls = [ https://sofifa.com ]
    user_agent =  Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(url=url, headers={ User-Agent : self.user_agent})

    def parse(self, response):
        time.sleep(1)
        for player in response.xpath( //a[@data-tippy-content and not(img) and not(contains(@href, "sort="))] ):
            yield {
                 full_name : player.xpath( ./@data-tippy-content ).get(),
                # extract other data as needed
            }

from scrapy.crawler import CrawlerProcess

process = CrawlerProcess({
     USER_AGENT :  Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 
})

process.crawl(PlayersSpider)
process.start()

你们实际需要做的是:follow to False, 载于>; 规则 构造者; 或者自其违约至以来全部删除参数。 如果已经设定了“法列

According to the scrapy docs

。 如果退约没有发生真实情况,否则就拖欠法勒。

因此,删除以下内容,即确保只将链接摘录所随便产生的回复发送至<代码>parse_item,并在随后的网页上不附加任何链接。

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import time


class PlayersSpider(CrawlSpider):
    name = "players"
    allowed_domains = ["sofifa.com"]
    user_agent =  Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 

    def start_requests(self):
        yield scrapy.Request(url=  https://sofifa.com , headers= { User-Agent : self.user_agent})

    rules = (
        Rule(LinkExtractor(restrict_xpaths= ( //table//tbody//tr/td[2]/a )[:60]), callback="parse_item"),
        )

    def parse_item(self, response):
        if  /player  in response.url:
            yield {
                 full_name : response.xpath( //div[@class="profile clearfix"]/h1/text() ).get(),
                 overall_rating : response.xpath( //div[@class="grid"]//em[1]/text() ).get()
            }

在使用<条码>sc球 players球 players球运动员-ors.json后 我只收到60份 j子卷宗,产生了以下产出。

产出

2024-02-01 18:51:25 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: spiders)
2024-02-01 18:51:25 [scrapy.utils.log] INFO: Versions: lxml 5.1.0.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.11.
7 (tags/v3.11.7:fa7a6f2, Dec  4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)], pyOpenSSL 24.0.0 (OpenSSL 3.2.1 30 Jan 2024), cryptography 42.0.2, Platform Windo
ws-10-10.0.22621-SP0
2024-02-01 18:51:25 [scrapy.addons] INFO: Enabled addons:
[]
2024-02-01 18:51:25 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2024-02-01 18:51:25 [scrapy.extensions.telnet] INFO: Telnet Password: 2a289292fd307038
2024-02-01 18:51:25 [scrapy.middleware] INFO: Enabled extensions:
[ scrapy.extensions.corestats.CoreStats ,
  scrapy.extensions.telnet.TelnetConsole ,
  scrapy.extensions.feedexport.FeedExporter ,
  scrapy.extensions.logstats.LogStats ]
2024-02-01 18:51:25 [scrapy.crawler] INFO: Overridden settings:
{ BOT_NAME :  spiders ,
  NEWSPIDER_MODULE :  spiders.spiders ,
  REQUEST_FINGERPRINTER_IMPLEMENTATION :  2.7 ,
  SPIDER_MODULES : [ spiders.spiders ],
  USER_AGENT :  Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36  
                (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 }
2024-02-01 18:51:25 [scrapy.middleware] INFO: Enabled downloader middlewares:
[ scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware ,
  scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware ,
  scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware ,
  scrapy.downloadermiddlewares.useragent.UserAgentMiddleware ,
  scrapy.downloadermiddlewares.retry.RetryMiddleware ,
  scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware ,
  scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware ,
  scrapy.downloadermiddlewares.redirect.RedirectMiddleware ,
  scrapy.downloadermiddlewares.cookies.CookiesMiddleware ,
  scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware ,
  scrapy.downloadermiddlewares.stats.DownloaderStats ]
2024-02-01 18:51:25 [scrapy.middleware] INFO: Enabled spider middlewares:
[ scrapy.spidermiddlewares.httperror.HttpErrorMiddleware ,
  scrapy.spidermiddlewares.offsite.OffsiteMiddleware ,
  scrapy.spidermiddlewares.referer.RefererMiddleware ,
  scrapy.spidermiddlewares.urllength.UrlLengthMiddleware ,
  scrapy.spidermiddlewares.depth.DepthMiddleware ]
2024-02-01 18:51:25 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2024-02-01 18:51:25 [scrapy.core.engine] INFO: Spider opened
2024-02-01 18:51:25 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-02-01 18:51:25 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com> (referer: None)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/246191/julian-alvarez/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/247635/khvicha-kvaratskhelia/240024/> (referer: https://sofifa.co
m)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/237086/min-jae-kim/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/245371/thiago-almada/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/246147/mason-greenwood/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/245152/santiago-gimenez/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/266253/ivan-fresneda-corraliza/240024/> (referer: https://sofifa.
com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/268421/mathys-tel/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/246191/julian-alvarez/240024/>
{ full_name :  Julián Álvarez ,  overall_rating :  81 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/247635/khvicha-kvaratskhelia/240024/>
{ full_name :  Khvicha Kvaratskhelia ,  overall_rating :  86 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/237086/min-jae-kim/240024/>
{ full_name :  김민재 金敏在 ,  overall_rating :  84 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/245371/thiago-almada/240024/>
{ full_name :  Thiago Ezequiel Almada ,  overall_rating :  80 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/246147/mason-greenwood/240024/>
{ full_name :  Mason Greenwood ,  overall_rating :  77 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/245152/santiago-gimenez/240024/>
{ full_name :  Santiago Tomás Giménez ,  overall_rating :  80 }
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/270086/antonio-joao-tavares-silva/240024/> (referer: https://sofi
fa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/247679/victor-boniface/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/266253/ivan-fresneda-corraliza/240024/>
{ full_name :  Iván Fresneda Corraliza ,  overall_rating :  72 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/268421/mathys-tel/240024/>
{ full_name :  Mathys Tel ,  overall_rating :  74 }
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/236772/dominik-szoboszlai/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/264309/arda-guler/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/256630/florian-wirtz/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/256402/carlos-alcaraz/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/265600/roony-bardghji/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/269312/tommaso-baldanzi/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/270086/antonio-joao-tavares-silva/240024/>
{ full_name :  António João Pereira Albuquerque Tavares Silva ,  overall_rating :  78 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/247679/victor-boniface/240024/>
{ full_name :  Victor Okoh Boniface ,  overall_rating :  80 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/236772/dominik-szoboszlai/240024/>
{ full_name :  Dominik Szoboszlai ,  overall_rating :  82 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/264309/arda-guler/240024/>
{ full_name :  Arda Güler ,  overall_rating :  77 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/256630/florian-wirtz/240024/>
{ full_name :  Florian Richard Wirtz ,  overall_rating :  86 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/256402/carlos-alcaraz/240024/>
{ full_name :  Carlos Jonas Alcaraz ,  overall_rating :  73 }
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/265600/roony-bardghji/240024/>
{ full_name :  Roony Bardghji ,  overall_rating :  70 }
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/259608/evan-ferguson/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/269312/tommaso-baldanzi/240024/>
{ full_name :  Tommaso Baldanzi ,  overall_rating :  77 }
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/231747/kylian-mbappe/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/260815/arnau-martinez-lopez/240024/> (referer: https://sofifa.com
)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/243780/kang-in-lee/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/240833/youssoufa-moukoko/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/255565/kaoru-mitoma/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/259608/evan-ferguson/240024/>
{ full_name :  Evan Ferguson ,  overall_rating :  74 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/231747/kylian-mbappe/240024/>
{ full_name :  Kylian Mbappé Lottin ,  overall_rating :  91 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/260815/arnau-martinez-lopez/240024/>
{ full_name :  Arnau Martínez López ,  overall_rating :  80 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/224232/nicolo-barella/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/231677/marcus-rashford/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/269859/arthur-vermeeren/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/243780/kang-in-lee/240024/>
{ full_name :  이강인 Kang In Lee ,  overall_rating :  78 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/240833/youssoufa-moukoko/240024/>
{ full_name :  Youssoufa Moukoko ,  overall_rating :  77 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/259240/adam-wharton/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/255565/kaoru-mitoma/240024/>
{ full_name :  三笘 薫 ,  overall_rating :  81 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/232293/victor-osimhen/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/257504/bilal-el-khannouss/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/262863/antonio-nusa/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/224232/nicolo-barella/240024/>
{ full_name :  Nicolò Barella ,  overall_rating :  86 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/231677/marcus-rashford/240024/>
{ full_name :  Marcus Rashford ,  overall_rating :  83 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/263620/romeo-lavia/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/269859/arthur-vermeeren/240024/>
{ full_name :  Arthur Vermeeren ,  overall_rating :  76 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/259240/adam-wharton/240024/>
{ full_name :  Adam Wharton ,  overall_rating :  71 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/232293/victor-osimhen/240024/>
{ full_name :  Victor James Osimhen ,  overall_rating :  88 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/257504/bilal-el-khannouss/240024/>
{ full_name :  Bilal El Khannouss ,  overall_rating :  73 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/252008/israel-reyes/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/248266/sacha-boey/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/258729/gabriel-veiga-novas/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/239085/erling-haaland/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/262863/antonio-nusa/240024/>
{ full_name :  Antonio Eromonsele Nordby Nusa ,  overall_rating :  71 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/224949/javairo-dilrosun/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/263620/romeo-lavia/240024/>
{ full_name :  Romeo Lavia ,  overall_rating :  73 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/245637/georginio-rutter/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/223689/wout-weghorst/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/252008/israel-reyes/240024/>
{ full_name :  Israel Reyes Romero ,  overall_rating :  75 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/248266/sacha-boey/240024/>
{ full_name :  Sacha Boey ,  overall_rating :  80 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/234569/florentino-morris-luis/240024/> (referer: https://sofifa.c
om)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/258729/gabriel-veiga-novas/240024/>
{ full_name :  Gabriel Veiga Novas ,  overall_rating :  78 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/239085/erling-haaland/240024/>
{ full_name :  Erling Braut Haaland ,  overall_rating :  91 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/224949/javairo-dilrosun/240024/>
{ full_name :  Javairô Dilrosun ,  overall_rating :  72 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/245637/georginio-rutter/240024/>
{ full_name :  Georginio Rutter ,  overall_rating :  74 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/250961/joshua-zirkzee/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/271575/simone-pafundi/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/237681/takefusa-kubo/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/229391/joao-maria-palhinha-goncalves/240024/> (referer: https://s
ofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/223689/wout-weghorst/240024/>
{ full_name :  Wout Weghorst ,  overall_rating :  77 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/272978/jorrel-hato/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/234569/florentino-morris-luis/240024/>
{ full_name :  Florentino Ibrain Morris Luís ,  overall_rating :  80 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/165153/karim-benzema/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/268438/alejandro-garnacho-ferreyra/240024/> (referer: https://sof
ifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/250961/joshua-zirkzee/240024/>
{ full_name :  Joshua Orobosa Zirkzee ,  overall_rating :  75 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/271575/simone-pafundi/240024/>
{ full_name :  Simone Pafundi ,  overall_rating :  67 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/253072/darwin-nunez/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/237681/takefusa-kubo/240024/>
{ full_name :  久保 建英 ,  overall_rating :  81 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/229391/joao-maria-palhinha-goncalves/240024/>
{ full_name :  João Maria Lobo Alves Palhinha Gonçalves ,  overall_rating :  84 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/272978/jorrel-hato/240024/>
{ full_name :  Jorrel Hato ,  overall_rating :  73 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/272834/joao-pedro-goncalves-neves/240024/> (referer: https://sofi
fa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/165153/karim-benzema/240024/>
{ full_name :  Karim Benzema ,  overall_rating :  90 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/276589/vitor-hugo-roque-ferreira/240024/> (referer: https://sofif
a.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/271574/rico-lewis/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/264298/conor-bradley/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/270673/warren-zaire-emery/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/271916/bryan-zaragoza-martinez/240024/> (referer: https://sofifa.
com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/268438/alejandro-garnacho-ferreyra/240024/>
{ full_name :  Alejandro Garnacho Ferreyra ,  overall_rating :  75 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/253072/darwin-nunez/240024/>
{ full_name :  Darwin Gabriel Núñez Ribeiro ,  overall_rating :  82 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/235790/kai-havertz/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/272834/joao-pedro-goncalves-neves/240024/>
{ full_name :  João Pedro Gonçalves Neves ,  overall_rating :  73 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/276589/vitor-hugo-roque-ferreira/240024/>
{ full_name :  Vitor Hugo Roque Ferreira ,  overall_rating :  76 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/271574/rico-lewis/240024/>
{ full_name :  Rico Lewis ,  overall_rating :  75 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/264298/conor-bradley/240024/>
{ full_name :  Conor Bradley ,  overall_rating :  69 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/270673/warren-zaire-emery/240024/>
{ full_name :  Warren Zaïre-Emery ,  overall_rating :  79 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/270964/jobe-bellingham/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/271916/bryan-zaragoza-martinez/240024/>
{ full_name :  Bryan Zaragoza Martínez ,  overall_rating :  73 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/212228/ivan-toney/240023/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/252371/jude-bellingham/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/235790/kai-havertz/240024/>
{ full_name :  Kai Lukas Havertz ,  overall_rating :  82 }
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/272926/lucas-bergvall/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/259399/rasmus-hojlund/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/269136/kobbie-mainoo/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/270964/jobe-bellingham/240024/>
{ full_name :  Jobe Bellingham ,  overall_rating :  66 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/212228/ivan-toney/240023/>
{ full_name :  Ivan Toney ,  overall_rating :  80 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/252371/jude-bellingham/240024/>
{ full_name :  Jude Victor William Bellingham ,  overall_rating :  87 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/272926/lucas-bergvall/240024/>
{ full_name :  Lucas Bergvall ,  overall_rating :  64 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/259399/rasmus-hojlund/240024/>
{ full_name :  Rasmus Winther Højlund ,  overall_rating :  77 }
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/269136/kobbie-mainoo/240024/>
{ full_name :  Kobbie Mainoo ,  overall_rating :  67 }
2024-02-01 18:51:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/263370/valentin-barco/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/263370/valentin-barco/240024/>
{ full_name :  Valentín Barco ,  overall_rating :  73 }
2024-02-01 18:51:28 [scrapy.core.engine] INFO: Closing spider (finished)
2024-02-01 18:51:28 [scrapy.extensions.feedexport] INFO: Stored json feed (60 items) in: players.json
2024-02-01 18:51:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{ downloader/request_bytes : 25570,
  downloader/request_count : 61,
  downloader/request_method_count/GET : 61,
  downloader/response_bytes : 916772,
  downloader/response_count : 61,
  downloader/response_status_count/200 : 61,
  elapsed_time_seconds : 2.138765,
  feedexport/success_count/FileFeedStorage : 1,
  finish_reason :  finished ,
  finish_time : datetime.datetime(2024, 2, 2, 2, 51, 28, 119043, tzinfo=datetime.timezone.utc),
  httpcompression/response_bytes : 3787828,
  httpcompression/response_count : 61,
  item_scraped_count : 60,
  log_count/DEBUG : 122,
  log_count/INFO : 11,
  request_depth_max : 1,
  response_received_count : 61,
  scheduler/dequeued : 61,
  scheduler/dequeued/memory : 61,
  scheduler/enqueued : 61,
  scheduler/enqueued/memory : 61,
  start_time : datetime.datetime(2024, 2, 2, 2, 51, 25, 980278, tzinfo=datetime.timezone.utc)}
2024-02-01 18:51:28 [scrapy.core.engine] INFO: Spider closed (finished)




相关问题
Scrapy SgmlLinkExtractor question

I am trying to make the SgmlLinkExtractor to work. This is the signature: SgmlLinkExtractor(allow=(), deny=(), allow_domains=(), deny_domains=(), restrict_xpaths(), tags=( a , area ), attrs=( href )...

Scrapy BaseSpider: How does it work?

This is the BaseSpider example from the Scrapy tutorial: from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from dmoz.items import DmozItem class DmozSpider(...

Designing a multi-process spider in Python

I m working on a multi-process spider in Python. It should start scraping one page for links and work from there. Specifically, the top-level page contains a list of categories, the second-level pages ...

What is the best way to crawl a login based sites?

I ve to automate a file download activity from a website (similar to, let s say, yahoomail.com). To reach a page which has this file download link, i ve to login, jump from page to page to provide ...

Twisted errors in Scrapy spider

When I run the spider from the Scrapy tutorial I get these error messages: File "C:Python26libsite-packages wistedinternetase.py", line 374, in fireEvent DeferredList(beforeResults)....

Crawling not working windows2008

We installed a new MOSS 2007 farm on windows 2008 SP2 enviroment. We used SQL2008 too. Configuration is 1 index, 1 FE and 1 server with 2008, all on ESX 4.0. All the Service that need it uses a ...

Is there a list of known web crawlers? [closed]

I m trying to get accurate download numbers for some files on a web server. I look at the user agents and some are clearly bots or web crawlers, but many for many I m not sure, they may or may not be ...

Most optimized way to store crawler states?

I m currently writing a web crawler (using the python framework scrapy). Recently I had to implement a pause/resume system. The solution I implemented is of the simplest kind and, basically, stores ...

热门标签