A zope.testbrowser extension with useragent faking and proxy abilities
Project description
Introduction
This Yet-Another-Mechanize implementation aims to give the developper those new features:
It can be proxified
It fakes user agent by default
It does not handle robots by default
TODO
lxml integration, maybe steal z3c.etestbrowser
Tests and Handbook
First, we need to instantiate the sources where we come from:
>>> import BaseHTTPServer >>> from SimpleHTTPServer import SimpleHTTPRequestHandler >>> from collective.anonymousbrowser.browser import Browser, FF2_USERAGENT >>> from threading import Thread
Run a basic request printers to check user agent and further requests:
>>> class ReqHandler(SimpleHTTPRequestHandler): ... def do_GET(self): ... self.end_headers() ... self.send_response(200, '\n\n<html>%s</html>' % self.headers) >>> httpd = BaseHTTPServer.HTTPServer(('', 45678,) , ReqHandler) >>> httpd1 = BaseHTTPServer.HTTPServer(('', 45679,) , ReqHandler) >>> httpd2 = BaseHTTPServer.HTTPServer(('', 45677,) , ReqHandler) >>> httpd3 = BaseHTTPServer.HTTPServer(('', 45676,) , ReqHandler) >>> httpd4 = BaseHTTPServer.HTTPServer(('', 45675,) , ReqHandler) >>> for item in (httpd, httpd1, httpd2, httpd3, httpd4): ... t = Thread(target=item.serve_forever) ... t.setDaemon(True) ... t.start()
User Agent
Oh, my god, we have a brand new user agent by default:
>>> br = Browser() ... we can have the output from the config creation there >>> br.open('http://localhost:45678') >>> FF2_USERAGENT in br.contents True >>> br2 = Browser('http://localhost:45678') >>> FF2_USERAGENT in br2.contents True
Proxy mode
But, we want to be anonymous, and we ll set a proxy To define those proxies, just just a config.ini file like:
[collective.anonymousbrowser] proxies = host1:port host2:port
When the browser has many proxies defined, it will circly through those ones. But, it will not use the same host indefinitivly, just set the proxy_max_use argument:
>>> from StringIO import StringIO >>> from tempfile import mkstemp >>> __, config = mkstemp() >>> open(config, 'w').write("""[ccollective.anonymousbrowser] ... proxies = ... 127.0.0.1:45675 ... 127.0.0.1:45676 ... 127.0.0.1:45677 ... 127.0.0.1:45678 ... 127.0.0.1:45679 ... """) >>> b = Browser(config=config) >>> b._config._sections {'ccollective.anonymousbrowser': {'__name__': 'ccollective.anonymousbrowser', 'proxies': '\n127.0.0.1:45675\n127.0.0.1:45676\n127.0.0.1:45677\n127.0.0.1:45678\n127.0.0.1:45679'}} >>> b.proxies ['127.0.0.1:45675', '127.0.0.1:45676', '127.0.0.1:45677', '127.0.0.1:45678', '127.0.0.1:45679'] >>> b.proxified True >>> b.open('http://localhost:45678') >>> 'Host: localhost:45678' in b.contents True >>> b._lastproxy['count'] == 1 and b._lastproxy['proxy'] in [0,1,2,3,4] True
We can have a normal unproxified brower too
>>> b1 = Browser(proxify=False) >>> b1.proxified False
Next thing to verify is that we have our pseudo-random loop running First thing is we will choose 2 times the 2nd proxy, then the third And of course, we will set the mocker to change the proxy at each row.:
>>> import mocker >>> import random >>> mocked = mocker.Mocker() >>> custom_random_int = mocked.replace('random.randint') >>> custom_random_int(0, 4) <mocker.Mock ... >>> mocked.result(2) >>> custom_random_int(0,1) <mocker.Mock ... >>> mocked.result(0) >>> custom_random_int(0, 4) <mocker.Mock ... >>> mocked.result(2) >>> custom_random_int(0,1) <mocker.Mock ... >>> mocked.result(0) >>> custom_random_int(0, 4) <mocker.Mock ... >>> mocked.result(2) >>> custom_random_int(0,1) <mocker.Mock ... >>> mocked.result(0) >>> custom_random_int(0, 4) <mocker.Mock ... >>> mocked.result(3) >>> custom_random_int(0,1) <mocker.Mock ... >>> mocked.result(0) >>> custom_random_int(0, 4) <mocker.Mock ... >>> mocked.result(4) >>> custom_random_int(0,1) <mocker.Mock ... >>> mocked.result(0) >>> custom_random_int(0, 4) <mocker.Mock ... >>> mocked.result(2) >>> custom_random_int(0,1) <mocker.Mock ... >>> mocked.result(0) >>> custom_random_int(0, 4) <mocker.Mock ... >>> mocked.result(1) >>> custom_random_int(0,1) <mocker.Mock ... >>> mocked.result(0) >>> mocked.replay() >>> b = Browser('http://localhost:45678', config=config) >>> b.open('http://localhost:45678') >>> b._lastproxy {'count': 1, 'proxy': 2} >>> b.open('http://localhost:45678') >>> b._lastproxy {'count': 2, 'proxy': 2} >>> b.open('http://localhost:45678') >>> b._lastproxy {'count': 3, 'proxy': 2} >>> b.open('http://localhost:45678') >>> b._lastproxy {'count': 1, 'proxy': 0} >>> b.open('http://localhost:45678') >>> b._lastproxy {'count': 1, 'proxy': 3} >>> b.open('http://localhost:45678') >>> b._lastproxy {'count': 1, 'proxy': 4} >>> b.open('http://localhost:45678') >>> b._lastproxy {'count': 1, 'proxy': 2} >>> b.open('http://localhost:45678') >>> b._lastproxy {'count': 1, 'proxy': 1}
The loop is recursion protected. If we return always the same host, so the chooser cannot choose anything else. It will loop until it crashes or it handle the recursion:
>>> def randomint(a,b): ... return 2 >>> import random; random.randint = randomint >>> b2 = Browser('http://localhost:45678', config=config) >>> b2.proxy_max_use 3 >>> b2._lastproxy['count'] 1 >>> b2.chooseProxy() '... >>> b2._lastproxy['count'] 2 >>> b2.chooseProxy() '... >>> b2._lastproxy['count'] 3 >>> b2.chooseProxy() '... >>> b2.chooseProxy() Ho, seems we got the max wills to choose, something has gone wrong '127.0.0.1:45675'
HISTORY
0.1
Initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for collective.anonymousbrowser-0.1dev-r73324.zip
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2aa4ef434103e3c73c238867830c7675c7e935fc3ed87424ae3a3270ee01f8bc |
|
MD5 | 601d122eb43584833f6f208d4b868fb5 |
|
BLAKE2b-256 | ae6cbba4030a3d04107cc135d45fb687e555c9f0f19eac86b336e7b816779160 |