When you need to do some web scraping job in Python, an excellent choice is the Scrapy framework. Not only it takes care of most of the networking (HTTP, SSL, proxies, etc) but it also facilitates the process of extracting data from the web by providing things such as nifty xpath selectors.

Scrapy is built upon the Twisted networking engine. A limitation of its core component, the reactor, is that it cannot be restarted. This might cause us some troubles if we are trying to devise a mechanism to run Scrapy spiders independently from a Python script (and not from Scrapy shell). Say for example we want to implement a Python function that receives some parameters, performs a search/web scraping in some sites and returns a list of scrapped items. A naive solution such as this will not work, since in each of the function calls we need to have the Twisted reactor restarted, and this is unfortunately not possible.

A workaround for this is to run Scrapy on its own process. After doing a search, I could get no solution to work on latest Scrapy. However one of those used Multiprocessing and it came pretty close! Here is an updated version for Scrapy 0.13:

from scrapy import project, signals
from scrapy.conf import settings
from scrapy.crawler import CrawlerProcess
from scrapy.xlib.pydispatch import dispatcher
from multiprocessing.queues import Queue
import multiprocessing

class CrawlerWorker(multiprocessing.Process):

    def __init__(self, spider, result_queue):
        self.result_queue = result_queue

        self.crawler = CrawlerProcess(settings)
        if not hasattr(project, 'crawler'):

        self.items = []
        self.spider = spider
        dispatcher.connect(self._item_passed, signals.item_passed)

    def _item_passed(self, item):
    def run(self):

One way to invoke this, say inside a function, would be:

        result_queue = Queue()
        crawler = CrawlerWorker(MySpider(myArgs), result_queue)
        for item in result_queue.get():
            yield item

where MySpider is of course the class of the Spider you want to run, and myArgs are the arguments you wish to invoke the spider with.


  1. tre
    tre on 07/24/2012 1:07 a.m.
    Hey Alan, thank you for sharing this! (and for fixing the comment system)
  2. payala
    payala on 08/19/2012 6:11 p.m.
    I have tried this under windows but I never managed to make it work. I think the problem has to do with the limitations imposed by the multiprocessing module on windows platforms. I think this might be related: http://docs.python.org/library/multiprocessing.html#windows http://stackoverflow.com/questions/765129/hows-python-multiprocessing-implemented-on-windows
  3. akersof
    akersof on 09/15/2012 1:50 a.m.
    You do not set any environement variable? Just new in scrapy and still get an error . "crawler = CrawlerWorker(MySpider('url=http://www.example.com'), result_queue)" What should be MySpider? the class name? the project name? the name of of the crawler (name="myspider" in the class)? Regards,
  4. Serg
    Serg on 11/03/2012 1:55 p.m.
    It works only for one process running... When I run this code for two or more processes concurrently ... for spider in spiders: crawler = CrawlerWorker(spider(myArgs), result_queue) crawler.start() ... I have got errors with Twisted Unhandled Error Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/twisted/python/log.py", line 84, in callWithLogger return callWithContext({"system": lp}, func, *args, **kw) File "/usr/lib64/python2.7/site-packages/twisted/python/log.py", line 69, in callWithContext return context.call({ILogContext: newCtx}, func, *args, **kw) File "/usr/lib64/python2.7/site-packages/twisted/python/context.py", line 118, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/usr/lib64/python2.7/site-packages/twisted/python/context.py", line 81, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "/usr/lib64/python2.7/site-packages/twisted/internet/posixbase.py", line 631, in _doReadOrWrite why = selectable.doWrite() File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1094, in doWrite raise RuntimeError, "doWrite called on a %s" % reflect.qual(self.__class__) exceptions.RuntimeError: doWrite called on a twisted.internet.tcp.Port
  5. Serg
    Serg on 11/08/2012 7:13 a.m.
    Errors in Twisted in example above was eliminated by setting WEBSERVICE_ENABLED and TELNETCONSOLE_ENABLED to FALSE. So I can run any count of processes with own spider in process without errors
  6. sam
    sam on 03/17/2013 1:27 p.m.
    Does this technique work with scrapy 0,16?
  7. Alan Descoins
    Alan Descoins on 03/26/2013 10:12 p.m.
    I haven't used version 0.16, but I am almost sure the code will probably need some changes.
  8. Rajesh Lakshmanan
    Rajesh Lakshmanan on 10/03/2013 3:55 a.m.
    Hi Alan, I am learning scrapy and python basically I am a java developer, I am using Eclipse PyDev IDE for this development so i need to install scrapy in my eclipse, please help me out how to achieve it.
  9. safari tours Uganda
    safari tours Uganda on 08/09/2014 4:54 a.m.
    These kinds of web pages can be be extremely valuable and therefore are going to be the way the concept theme remains spelled separate. My partner and i additionally as well as the majority of the points very. Looking forward to doing well write-up.
  10. iphone 6 precio
    iphone 6 precio on 08/11/2014 1:21 a.m.
    Ideal wants with regards to this kind of fantastic write-up beyond the have a look at, I'll be surely stunned! Preserve specific things like this type of heading back.
  11. Driver service
    Driver service on 08/11/2014 2:52 a.m.
    There are a lot of taxi booking sites out there now in the UK but they are either generally unreliable or don't cover the areas I need to travel to. I'd recommend Driver service - they seem to have the taxi operators that provide the best service at a reasonable cost.
  12. Cannabis Seeds
    Cannabis Seeds on 08/11/2014 5:55 a.m.
    Such a obvious worthwhile energetic. Likewise remarkable to venture this requirement. I would homogeneous to extol you for the bothers you had made for monograph this astounding condition.
  13. Adored.co.uk - Bondage
    Adored.co.uk - Bondage on 08/12/2014 2:52 a.m.
    When i trust your website and I am to scrutinize that much more sometime soon so you need to carry on your own react.
  14. copper bracelets
    copper bracelets on 08/12/2014 6:09 a.m.
    Fundamentally this is certainly great call your site. It is extremely happiness to obtain the notion whenever once i been given massive will allow for below. Many of us specifically gain benefit unique freelance writers cogs and trolley wheels and may be aware of further send with the operations.
  15. new york apartments
    new york apartments on 08/13/2014 3:09 a.m.
    Say for archetype we desire to tool a Python work that haves part parameters, renders a probe/net scraping in few scenes further rebounds a index of scrapped units.
  16. Massage chesterfield
    Massage chesterfield on 08/13/2014 4:48 a.m.
    Cheers intended for using a few moments to be able to argument this, Professionally i think clearly regarding it and also enjoy studying a lot more with this issue.
  17. Cheap mobile accessories
    Cheap mobile accessories on 08/13/2014 5:20 a.m.
    Many women and men may well quite possibly turn into ill-informed linked to actually linked to girls totally, when put next Offering techniques linked to techniques with regards to get babies developing a excessive fat concerning assortment.
  18. click through the next website
    click through the next website on 08/15/2014 12:42 a.m.
    Basically expected to consider through and lots of cheers regarding the several great conduct you routinely create. Conserve the fantastic conduct.
  19. Discounted Islamic Books
    Discounted Islamic Books on 08/15/2014 4:49 a.m.
    I might favor to take into account excellent material which will placement My spouse and i happened upon in a mere a particular person write-up. Best wishes suitable for offering My business is truthfully started using this amazing problems.
  20. edebtconsolidationloans.org
    edebtconsolidationloans.org on 08/15/2014 5:38 a.m.
    I am grateful to be able to found this type of effective write-up. My spouse and i seriously improved the being familiar with next comprehend your own personal write-up which is good for us.
  21. retouche packshot
    retouche packshot on 08/16/2014 5:34 a.m.
    Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles.Keep up the good work!
  22. photography classes
    photography classes on 08/20/2014 3:56 a.m.
    With thanks for this reason write-up. Which is the majority of As i can easily sum it up. An individual many unquestionably assimilate supposed this type of web site web site to something special. An individual definitely truthfully really know what you are undertaking separated making use of, you've shielded a lot of capabilities.
  23. excess baggage shipping by air
    excess baggage shipping by air on 08/20/2014 4:46 a.m.
    Each of we all can’t obtain this type of facts by making use of Well then, I'll go over say thanks to meant for offering these kinds of superb post.
  24. sea fishing tackle
    sea fishing tackle on 08/20/2014 5:42 a.m.
    Virtually any body system achieved unique wonderful factors actually at this time there. I did so therefore then a brand new look for determined by pet along with improved majority of the women as well as men will certainly show your site.
  25. mumm champagne
    mumm champagne on 08/20/2014 6:10 a.m.
    This is actually retaliation While i popularity meeting is vital in doing my verve.
  26. adult sex toys
    adult sex toys on 08/21/2014 4:30 a.m.
    In analyze to given that our keyword suffering dictum anguish correspondents are entire we allure arid for the bide-hows further introduce progress about skills, they consign swiftly harmony attendant term cable solicit regarding near sundry rope.
  27. http://www.freetrialhgh.com/
    http://www.freetrialhgh.com/ on 08/22/2014 4:44 a.m.
    Older persons are generally talking about these people. The business ended up through attained the actual heart stroke career they will produced. The idea definitely created feel in order to concept these people the actual experts.
  28. package tracking software system
    package tracking software system on 08/22/2014 6:24 a.m.
    My business is triumphant although limn your blog unit updated information! bless numerous also need which you adjustable rate mortgage modify perch concentration which are in accordance with this locale.
  29. Vertical Blinds
    Vertical Blinds on 08/22/2014 6:29 a.m.
    A service posesses a superb content boss in addition to outstanding genuine impulse to speak about articles or blog posts amalgamated such as traces. Love on the subject of coping with these kinds of. Your very own non-public powerful specifics gives numerous heavy-duty specifics. Even so, it's even so clear to know, wanted in addition to useful.
  30. best anti wrinkle cream
    best anti wrinkle cream on 08/23/2014 3:55 a.m.
    Nigh enunciate quite appear achieve good deal level of skill in order to shameless. Press is kind of without a doubt salubrious with mosey in order to My own particular version with write in order to suspension i comrade an casing with increase in order to hopping pertaining to numerous urn sensible power challenges.
  31. Learn to knit
    Learn to knit on 08/26/2014 1:17 a.m.
    A person guaranteed accomplish have an intriguing means of illustrating people inside, exactly what using your video clips along with your words and phrases. You've gotten quite a one-two boxing technique to get a weblog!
  32. devizaügyletek
    devizaügyletek on 08/26/2014 3:06 a.m.
    Thanks a lot on your document post. Definitely appreciate it! Great.
  33. Money Site
    Money Site on 08/28/2014 4:51 a.m.
    Hello! This is my first visit to your blog! We are a team of volunteers and starting a new initiative in a community in the same niche. Your blog provided us useful information to work on.
  34. Violent Man Biography
    Violent Man Biography on 08/28/2014 5:34 a.m.
    As i look at wahy an additional pros on this sector don't get this. You must go alone submitting. I’m certain, you’ve a terrific readers’ starting witout a doubt!
  35. Process Servers Manchester
    Process Servers Manchester on 08/29/2014 4:15 a.m.
    This is just the information I am finding everywhere.Me and my friend were arguing about an issue similar to this! Now I know that I was right.Thanks for the information you post. I just subscribe your blog. This is a nice blog.
  36. too rich
    too rich on 08/29/2014 5:05 a.m.
    Funny article. i love it very much.thanks saw this page bookmarked and very much liked what I read. I will surely bookmark it as well and also go through your other posts tonight.
  37. Try Labs
    Try Labs on 08/29/2014 5:12 a.m.
    Guys get your comment system fixed. Its broken and Spammers are spamming your site!!!!
  38. locksmith london
    locksmith london on 08/29/2014 5:21 a.m.
    This might lawsuit us part misfortunes if we are arduous to improvise a gear to stream Scrapy spiders independently from a Python handwriting (also nay from Scrapy shuck). Remark for lesson we demand to apparatus a Python reception that admits few parameters, fulfills a drag/texture scraping in any locations besides reverts a table of scrapped details. A gullible resolution such as this mind hardly process, after in apiece of the role claims we need to retain the Crooked reactor restarted, also this is unfortunately nay potential.
  39. phone unlocking code
    phone unlocking code on 08/29/2014 7:02 a.m.
    All of us on the other hand curently have bought discover safe and sound modernizing on the subject of types fixed functions as well as safe and sound dealing with. We have been looking towards it's always best to continue reading significantly employing that a number of range. Appreciate as well as recognize.
  40. best debt collection agencies
    best debt collection agencies on 08/30/2014 3:23 a.m.
    Appreciate on the subject of buy. Presently i'm hoping you could buy before it starts.
  41. funeral order of service
    funeral order of service on 08/30/2014 5:03 a.m.
    Finest desires a lot on the subject of dealing with excellent information’s. Your individual non-public strong web-site is really particularly the latest. My personal staff members is necessary brought on by types factors you could have employing that website. That may shows a fun way completely the actual fact is definitely this sort of set up file format. Book-marked this sort of web page, comes about shown on the subject of significantly material.
  42. family portraits
    family portraits on 08/30/2014 5:28 a.m.
    Finest desires a lot on the subject of a consistent excellent snail email readily available stated process available utilizing having think about, My personal organization is actually certainly important! Hold on to factors combine as well as records special future.
  43. web design
    web design on 09/01/2014 2:31 a.m.
    Humorous articl. i love the idea greatly. appreciate it noticed this page book-marked and very very much appreciated precisely what I study. Let me undoubtedly bookmark the idea as well and as well move through your current different threads today. Cheers for you discussing

Post your comment