insophia is sharing code with you

Bitbucket is a code hosting site. Unlimited public and private repositories. Free for small teams.

Don't show this again

insophia / scrapy http://scrapy.org/

Scrapy has moved to Github: https://github.com/scrapy/scrapy. This legacy Mercurial repo was kept for reference purposes and is no longer being updated.

Clone this repository (size: 8.3 MB): HTTPS / SSH
hg clone https://bitbucket.org/insophia/scrapy
hg clone ssh://hg@bitbucket.org/insophia/scrapy

Searching for commits

Mercurial supports a functional language for selecting a set of revisions.

The language supports a number of predicates which are joined by infix operators. Parenthesis can be used for grouping.

Identifiers such as branch names must be quoted with single or double quotes if they contain characters outside of [._a-zA-Z0-9\x80-\xff] or if they match one of the predefined predicates.

Prefix operators

not x
Changesets not in x. Short form is ! x.

Infix operators

x::y

A DAG range, meaning all changesets that are descendants of x and ancestors of y, including x and y themselves. If the first endpoint is left out, this is equivalent to ancestors(y), if the second is left out it is equivalent to descendants(x).

An alternative syntax is x..y.

x:y
All changesets with revision numbers between x and y, both inclusive. Either endpoint can be left out, they default to 0 and tip.
x and y
The intersection of changesets in x and y. Short form is x & y.
x or y
The union of changesets in x and y. There are two alternative short forms: x | y and x + y.
x - y
Changesets in x but not in y.

Predicates

all()
All changesets, the same as 0:tip.
ancestor(single, single)
Greatest common ancestor of the two changesets.
ancestors(set)
Changesets that are ancestors of a changeset in set.
author(string)
Alias for user(string).
bookmark([name])
The named bookmark or all bookmarks.
branch(set)
All changesets belonging to the branches of changesets in set.
children(set)
Child changesets of changesets in set.
closed()
Changeset is closed.
date(interval)
Changesets within the interval, see hg help dates.
descendants(set)
Changesets which are descendants of changesets in set.
file(pattern)
Changesets affecting files matched by pattern.
follow()
An alias for ::. (ancestors of the working copy's first parent).
grep(regex)
Like keyword(string) but accepts a regex. Use grep(r'...') to ensure special escape characters are handled correctly.
head()
Changeset is a named branch head.
heads(set)
Members of set with no children in set.
id(string)
Revision non-ambiguously specified by the given hex string prefix.
keyword(string)
Search commit message, user name, and names of changed files for string.
limit(set, n)
First n members of set.
max(set)
Changeset with highest revision number in set.
merge()
Changeset is a merge changeset.
min(set)
Changeset with lowest revision number in set.
p1([set])
First parent of changesets in set, or the working directory.
p2([set])
Second parent of changesets in set, or the working directory.
parents([set])
The set of all parents for all changesets in set, or the working directory.
present(set)
An empty set, if any revision in set isn't found; otherwise, all revisions in set.
rev(number)
Revision with the given numeric identifier.
roots(set)
Changesets with no parent changeset in set.
tag(name)
The specified tag by name, or all tagged revisions if no name is given.
user(string)
User name is string.

Commits 1–30 of 2,804

Author Revision Comments Message Labels Date
Daniel Graña dc6a85919ac6 Do not filter requests with dont_filter attribute set in OffsiteMiddleware
Pablo Hoffman 6a265601e6c9 scrapyd: updated schedule.json response format
Pablo Hoffman 708c9e73cfd6 added unittest for SpiderState extension
Pablo Hoffman 4a6c0d4d7e8d restored support for spider.DOWNLOAD_DELAY attribute, with deprecation warning
Pablo Hoffman 3a133543f077 replaced use of deprecated w3lib.url.urljoin_rfc by stdlib urlparse.urljoin
Pablo Hoffman ee94734bd4cb removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead)
Pablo Hoffman 570d3816d1b9 added initial documentation about suspend and resume crawls
Pablo Hoffman 006189957f0f added SpiderState extension
Pablo Hoffman 84e836094193 fixed subtle bug in disk-based priority queues caused by serialization errors, and added tests
Pablo Hoffman b9bac446db7b add setting to enable logging when unserializable requests are found
Pablo Hoffman 164971a7a918 PickleDiskQueue: use pickle protocol 2
Pablo Hoffman 5e1dca232821 minor fix to doc
Pablo Hoffman 254bdbcb31b1 no longer recommend using labmda's in the doc, as they're not friendly with scheduler persistence
Pablo Hoffman 8004d0165e3b remove redundant code
Pablo Hoffman 7867f745eb8d updated documentation and code to use -s instead of --set option
Pablo Hoffman b9042f057066 remove unneeded code to simplify
Pablo Hoffman f86d5113b9ec scrapy tool: added -s alias for --set option
Pablo Hoffman d79145ef9836 persistent scheduler: use pickle (instead of marshal) as the default serialization format, to support serializing more objects out of the box. also removed __slots__ from Request/Response objects to make them serializable by default.
Daniel Graña 6c88f13a4cff ignore *egg-info added by pip install -e
Pablo Hoffman ddc8b212f2aa adapted test-scrapyd.sh to be compatible with older versions of mktemp, and to not hang forever is spider doesn't run for some reason
Pablo Hoffman 23dc03dc20ed scrapyd: documented support for passing setting to spiders in schedule.json
Pablo Hoffman 335dd7db6f1d added scrapyd system test script to extras/test-scrapyd.sh
Pablo Hoffman 9121d0314f9b moved scrapy.utils.sqlite to scrapyd.sqlite
Pablo Hoffman a060c287017e removed (barely used) spider context extension, to drop dependencies with sqlite
Pablo Hoffman a80cfee6ef31 scrapyd: added support for passing custom settings to schedule.json
Pablo Hoffman baf586e4d234 removed class method from_settings from ISpiderManager interface
Pablo Hoffman d0e120bf6a20 fixed unittest broken by previous commit
Pablo Hoffman da33a7d18641 pass close reason to close() method of new DupeFilter
Pablo Hoffman 70e637518294 minor doc fix
Pablo Hoffman ff88057c4cf7 fixed priority handling on the new scheduler so that it's backwards compatible (ie. bigger priorities are higher). also fixed a few documentation bugs related to requests priority
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 93
  8. 94
  9. »