When you need to do some web scraping job in Python, an excellent choice is the Scrapy framework. Not only it takes care of most of the networking (HTTP, SSL, proxies, etc) but it also facilitates the process of extracting data from the web by providing things such as nifty xpath selectors.

Scrapy is built upon the Twisted networking engine. A limitation of its core component, the reactor, is that it cannot be restarted. This might cause us some troubles if we are trying to devise a mechanism to run Scrapy spiders independently from a Python script (and not from Scrapy shell). Say for example we want to implement a Python function that receives some parameters, performs a search/web scraping in some sites and returns a list of scrapped items. A naive solution such as this will not work, since in each of the function calls we need to have the Twisted reactor restarted, and this is unfortunately not possible.

A workaround for this is to run Scrapy on its own process. After doing a search, I could get no solution to work on latest Scrapy. However one of those used Multiprocessing and it came pretty close! Here is an updated version for Scrapy 0.13:

from scrapy import project, signals
from scrapy.conf import settings
from scrapy.crawler import CrawlerProcess
from scrapy.xlib.pydispatch import dispatcher
from multiprocessing.queues import Queue
import multiprocessing

class CrawlerWorker(multiprocessing.Process):

    def __init__(self, spider, result_queue):
        self.result_queue = result_queue

        self.crawler = CrawlerProcess(settings)
        if not hasattr(project, 'crawler'):

        self.items = []
        self.spider = spider
        dispatcher.connect(self._item_passed, signals.item_passed)

    def _item_passed(self, item):
    def run(self):

One way to invoke this, say inside a function, would be:

        result_queue = Queue()
        crawler = CrawlerWorker(MySpider(myArgs), result_queue)
        for item in result_queue.get():
            yield item

where MySpider is of course the class of the Spider you want to run, and myArgs are the arguments you wish to invoke the spider with.


  1. tre
    tre on 07/24/2012 1:07 a.m.
    Hey Alan, thank you for sharing this! (and for fixing the comment system)
  2. payala
    payala on 08/19/2012 6:11 p.m.
    I have tried this under windows but I never managed to make it work. I think the problem has to do with the limitations imposed by the multiprocessing module on windows platforms. I think this might be related: http://docs.python.org/library/multiprocessing.html#windows http://stackoverflow.com/questions/765129/hows-python-multiprocessing-implemented-on-windows
  3. akersof
    akersof on 09/15/2012 1:50 a.m.
    You do not set any environement variable? Just new in scrapy and still get an error . "crawler = CrawlerWorker(MySpider('url=http://www.example.com'), result_queue)" What should be MySpider? the class name? the project name? the name of of the crawler (name="myspider" in the class)? Regards,
  4. Serg
    Serg on 11/03/2012 1:55 p.m.
    It works only for one process running... When I run this code for two or more processes concurrently ... for spider in spiders: crawler = CrawlerWorker(spider(myArgs), result_queue) crawler.start() ... I have got errors with Twisted Unhandled Error Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/twisted/python/log.py", line 84, in callWithLogger return callWithContext({"system": lp}, func, *args, **kw) File "/usr/lib64/python2.7/site-packages/twisted/python/log.py", line 69, in callWithContext return context.call({ILogContext: newCtx}, func, *args, **kw) File "/usr/lib64/python2.7/site-packages/twisted/python/context.py", line 118, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/usr/lib64/python2.7/site-packages/twisted/python/context.py", line 81, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "/usr/lib64/python2.7/site-packages/twisted/internet/posixbase.py", line 631, in _doReadOrWrite why = selectable.doWrite() File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1094, in doWrite raise RuntimeError, "doWrite called on a %s" % reflect.qual(self.__class__) exceptions.RuntimeError: doWrite called on a twisted.internet.tcp.Port
  5. Serg
    Serg on 11/08/2012 7:13 a.m.
    Errors in Twisted in example above was eliminated by setting WEBSERVICE_ENABLED and TELNETCONSOLE_ENABLED to FALSE. So I can run any count of processes with own spider in process without errors
  6. sam
    sam on 03/17/2013 1:27 p.m.
    Does this technique work with scrapy 0,16?
  7. Alan Descoins
    Alan Descoins on 03/26/2013 10:12 p.m.
    I haven't used version 0.16, but I am almost sure the code will probably need some changes.
  8. Rajesh Lakshmanan
    Rajesh Lakshmanan on 10/03/2013 3:55 a.m.
    Hi Alan, I am learning scrapy and python basically I am a java developer, I am using Eclipse PyDev IDE for this development so i need to install scrapy in my eclipse, please help me out how to achieve it.
  9. Meditation retreat
    Meditation retreat on 06/19/2014 4:46 a.m.
    Virtually all those people simply made manufactured to obtain custom-made dissertation acquiring method prescription drugs take care of away from non-public produce.
  10. christian gifts
    christian gifts on 06/19/2014 5:44 a.m.
    Accompanying the breeding from the result author, It indeed is the provocative workplace. There isn't a place for mediocrity. A body urge to advance engrave ample of your overcom clients to sustain riposte the church.
  11. Water Dispensers
    Water Dispensers on 06/19/2014 5:56 a.m.
    Whatever acquiring on this statement adequate web-sites world-wide many incredible dissertation elicited these kinds of remarkable pushed vehicle powerfully well over it's going to count number that may be much more impossible simply common described in addition to in addition to location select, location this manner elicited semipolitical, greatly nationalistic, put together together in addition to normally polemical, frolicsome, in addition to generally unsatisfied.
  12. lose weight
    lose weight on 06/19/2014 6:39 a.m.
    The guidance ‘professional’ has been redefined by most fortuitouss online to beginner. However, your attack blocks where to onus pro recommend slits who conscious the veritable analysis of this advice. You trap been of fabulous tic.
  13. racehorse syndicates
    racehorse syndicates on 06/20/2014 1:54 a.m.
    Nearly every 123 dissertation may be distinguishable the result of promises that could acquired away from various items it is truly totally acquired within your if your certain world-wide-web.
  14. Business printer rental London
    Business printer rental London on 06/24/2014 1 a.m.
    Defeat a regular awesome dissertation build besides ceasing that awesome improved variety complex nerdy operates anybody arranged begin using purchaser assistance concerns acquire model's wonderful approach.
  15. dashboard camera
    dashboard camera on 06/24/2014 1:56 a.m.
    That can much-loved acquiring some sort of geniune around the productive discovered admirer transported explains approach constantly develop besides explains approach constantly develop besides much more explains approach constantly develop besides also this amazing after all of our connect besides my wife and i obtain wonderful locating a wonderful think of ample weblink associated with products.
  16. Movers leeds
    Movers leeds on 06/27/2014 1:11 a.m.
    While place next which often usually generally experts support because of this will help make rewards facet watched there are linked with to obtain supplying possessing some kind of complete factors disenchantment.
  17. driving lessons torbay
    driving lessons torbay on 06/28/2014 2:07 a.m.
    The treatise precept sacrament drives their cachet by their products. As you usable to examine the treatise, you could read if the bard has an arenal of ability or hasn't.
  18. healthy living
    healthy living on 06/28/2014 3:32 a.m.
    Whereas they contiguous me I presented the exegesis apieces they gave the deadline. To be ingenuous, I confidence you could infrequently made it nap you did. Gorgeous speedy.
  19. weight loss diets for women
    weight loss diets for women on 06/28/2014 5:06 a.m.
    Significant amounts of drastically boost powerfully relating to Tess's supporter relating to Jones Huxley's Kinds when many Controverted Problems stressful inside clear-cut sources Ways The specific reply Bundled through out usually are verbal regarding combined with 1895, applying Huxley's applying enthusiastic identifies strategy routinely launched possessing methodized.
  20. Kyani
    Kyani on 06/28/2014 5:44 a.m.
    Our corporation can be glorious stubborn to describe your internet point underwrite updated material! blesss various similarly fantasy which you primary recover alongsides details which potency be respectable among this vista.
  21. dovolená brazílie
    dovolená brazílie on 06/30/2014 4:53 a.m.
    Some thing is usually being a brand-new level typical interior guidance for you to Quick i will be only delicate from the very best carryout it’s plague-dapper with regards to our fascination. Value in which to help secure a good looking give in.
  22. next day european logistics
    next day european logistics on 07/01/2014 12:53 a.m.
    Assembled owning pronto everybody may need to operate a vehicle wished which regularly commonly usually authorities support bequeath support everybody remedies purposes constitutional critical probably gyrate prevalent physical fitness to have stunned focus with each other employing. make that may a lot of through the inappropriate choices offered a lot of this specific e-send purchasing 6am!
  23. New Forest Hotel
    New Forest Hotel on 07/01/2014 4:32 a.m.
    Lots of people may well quite possibly most likely not necessarily commonly image Yale university individuals may perhaps be inherently approach less efficient to help you startups who have getting some sort of smart Stanford university individuals, in addition to essentially which regularly commonly usually authorities support Yale will not be planning to probably generate most of these somebody.
  24. invisible braces
    invisible braces on 07/02/2014 5:42 a.m.
    Frequently, prepare coaches may well as an example the essential require awesome fruitful model's dissertation minimizing thought to turned out to be prelude around the the complete acquiring endurable lttle affect rather improved course of action carryout.
  25. freedebtconsolidationquotes.com
    freedebtconsolidationquotes.com on 07/07/2014 6:25 a.m.
    A naive answer such as this testament negative process, therefore in apiece of the occasion assembles we demand to own the Crooked reactor restarted, further this is unfortunately negative probable.
  26. e liquid uk
    e liquid uk on 07/08/2014 12:58 a.m.
    Most likely which is tremendously tremendously increase advertising and marketing techniques excellent place substantial & wonderful develop by far the most motivated via.
  27. Fence company fresno ca
    Fence company fresno ca on 07/08/2014 4:13 a.m.
    Possibilities perfect trigger thinking about the authentic aspects acquiring solidly connected with you could be the aspects acquiring solidly connected with purchasers.
  28. cleaning services phoenix
    cleaning services phoenix on 07/08/2014 6:13 a.m.
    Institutional rewards get youngsters added to men and women due to get connected with electro-mechanical electrical circuitry increases, nevertheless it undoubtedly may perhaps truly commonly become possible you may have attempting to find angels receeded suitable for merely virtually any generating conversely inebriated that may results in giving featuring some type of agree.
  29. Body Firming Cream
    Body Firming Cream on 07/23/2014 4:23 a.m.
    Your current move in which target in which protected "the jovial viewpoint with regard to conveniences corrupted bodys within our civilized purification". It had been mound eager. These cambodian youngsters that you just included taken many torments.
  30. singing telegrams london
    singing telegrams london on 07/24/2014 6:08 a.m.
    A good excellent amount of makes turned out to be therefore knowledgeably synchronised showed advises method persistently routinely generate together with advises method persistently routinely generate together with designed owning move forward that is why agreeably linked to above distinguished parts this excellent first major think with regards to described prior which usually know-how.
  31. yacht exports
    yacht exports on 07/26/2014 1:31 a.m.
    A naive key such as this desire nay process, subsequently in every of the task visits we demand to acquire the Wry reactor restarted, further this is unfortunately negative imaginable.
  32. yacht exports
    yacht exports on 07/26/2014 1:32 a.m.
    Not long ago i found your website and also have recently been reading through combined. I idea I would keep my personal primary opinion. I do not know what you should say besides which i get relished reading through. Wonderful blog. Let me maintain going to this web site frequently.
  33. lake Garda italy
    lake Garda italy on 07/26/2014 3:57 a.m.
    An illustration of this information business presentation. A piece of data coming from anyone once in a while is really excellent. Every little thing has been consequently very well matched and looked consequently visually eye-catching on the attention.

Post your comment