Skip to main content

A zope.testbrowser extension with useragent faking and proxy abilities

Project description

Introduction

This Yet-Another-Mechanize implementation aims to give the developper those new features:

  • It can be proxified

  • It fakes user agent by default

  • It does not handle robots by default

TODO

  • lxml integration, maybe steal z3c.etestbrowser

Tests and Handbook

First, we need to instantiate the sources where we come from:

>>> import BaseHTTPServer
>>> from SimpleHTTPServer import SimpleHTTPRequestHandler
>>> from collective.anonymousbrowser.browser import Browser, FF2_USERAGENT
>>> from threading import Thread

Run a basic request printers to check user agent and further requests:

>>> class ReqHandler(SimpleHTTPRequestHandler):
...     def do_GET(self):
...         self.end_headers()
...         self.send_response(200, '\n\n<html>%s</html>' % self.headers)
>>> httpd  =  BaseHTTPServer.HTTPServer(('', 45678,) , ReqHandler)
>>> httpd1 =  BaseHTTPServer.HTTPServer(('', 45679,) , ReqHandler)
>>> httpd2 =  BaseHTTPServer.HTTPServer(('', 45677,) , ReqHandler)
>>> httpd3 =  BaseHTTPServer.HTTPServer(('', 45676,) , ReqHandler)
>>> httpd4 =  BaseHTTPServer.HTTPServer(('', 45675,) , ReqHandler)
>>> for item in (httpd, httpd1, httpd2, httpd3, httpd4):
...      t = Thread(target=item.serve_forever)
...      t.setDaemon(True)
...      t.start()

User Agent

Oh, my god, we have a brand new user agent by default:

>>> br = Browser()
...  we can have the output from the config creation there
>>> br.open('http://localhost:45678')
>>> FF2_USERAGENT in br.contents
True
>>> br2 = Browser('http://localhost:45678')
>>> FF2_USERAGENT in br2.contents
True

Proxy mode

But, we want to be anonymous, and we ll set a proxy To define those proxies, just just a config.ini file like:

[collective.anonymousbrowser]
proxies =
    host1:port
    host2:port

When the browser has many proxies defined, it will circly through those ones. But, it will not use the same host indefinitivly, just set the proxy_max_use argument:

>>> from StringIO import StringIO
>>> from tempfile import mkstemp
>>> __, config = mkstemp()
>>> open(config, 'w').write("""[collective.anonymousbrowser]
... proxies =
...     127.0.0.1:45675
...     127.0.0.1:45676
...     127.0.0.1:45677
...     127.0.0.1:45678
...     127.0.0.1:45679
...     """)
>>> b = Browser(config=config)
>>> b._config._sections
{'collective.anonymousbrowser': {'__name__': 'collective.anonymousbrowser', 'proxies': '\n127.0.0.1:45675\n127.0.0.1:45676\n127.0.0.1:45677\n127.0.0.1:45678\n127.0.0.1:45679'}}
>>> b.proxies
['127.0.0.1:45675', '127.0.0.1:45676', '127.0.0.1:45677', '127.0.0.1:45678', '127.0.0.1:45679']
>>> b.proxified
True
>>> b.open('http://localhost:45678')
>>> 'Host: localhost:45678' in b.contents
True
>>> b._lastproxy['count'] == 1 and b._lastproxy['proxy'] in [0,1,2,3,4]
True

We can have a normal unproxified browser too

>>> b1 = Browser(proxify=False)
>>> b1.proxified
False

Next thing to verify is that we have our pseudo-random loop running First thing is we will choose 2 times the 2nd proxy, then the third And of course, we will set the mocker to change the proxy at each row.:

>>> import mocker
>>> import random
>>> mocked = mocker.Mocker()
>>> custom_random_int = mocked.replace('random.randint')
>>> custom_random_int(0, 4)
<mocker.Mock ...
>>> mocked.result(2)
>>> custom_random_int(0,1)
<mocker.Mock ...
>>> mocked.result(0)
>>> custom_random_int(0, 4)
<mocker.Mock ...
>>> mocked.result(2)
>>> custom_random_int(0,1)
<mocker.Mock ...
>>> mocked.result(0)
>>> custom_random_int(0, 4)
<mocker.Mock ...
>>> mocked.result(2)
>>> custom_random_int(0,1)
<mocker.Mock ...
>>> mocked.result(0)
>>> custom_random_int(0, 4)
<mocker.Mock ...
>>> mocked.result(3)
>>> custom_random_int(0,1)
<mocker.Mock ...
>>> mocked.result(0)
>>> custom_random_int(0, 4)
<mocker.Mock ...
>>> mocked.result(4)
>>> custom_random_int(0,1)
<mocker.Mock ...
>>> mocked.result(0)
>>> custom_random_int(0, 4)
<mocker.Mock ...
>>> mocked.result(2)
>>> custom_random_int(0,1)
<mocker.Mock ...
>>> mocked.result(0)
>>> custom_random_int(0, 4)
<mocker.Mock ...
>>> mocked.result(1)
>>> custom_random_int(0,1)
<mocker.Mock ...
>>> mocked.result(0)
>>> mocked.replay()
>>> b = Browser('http://localhost:45678', config=config)
>>> b.open('http://localhost:45678')
>>> b._lastproxy
{'count': 1, 'proxy': 2}
>>> b.open('http://localhost:45678')
>>> b._lastproxy
{'count': 2, 'proxy': 2}
>>> b.open('http://localhost:45678')
>>> b._lastproxy
{'count': 3, 'proxy': 2}
>>> b.open('http://localhost:45678')
>>> b._lastproxy
{'count': 1, 'proxy': 0}
>>> b.open('http://localhost:45678')
>>> b._lastproxy
{'count': 1, 'proxy': 3}
>>> b.open('http://localhost:45678')
>>> b._lastproxy
{'count': 1, 'proxy': 4}
>>> b.open('http://localhost:45678')
>>> b._lastproxy
{'count': 1, 'proxy': 2}
>>> b.open('http://localhost:45678')
>>> b._lastproxy
{'count': 1, 'proxy': 1}
>>> mocked.restore()

If the proxies are dead, we remove them from the list:

>>> __, config = mkstemp()
>>> open(config, 'w').write("""[collective.anonymousbrowser]
... proxies =
...     127.0.0.1:35675
...     127.0.0.1:35676
...     127.0.0.1:35677
...     127.0.0.1:45678
...     """)
>>> mybrowser = Browser(config=config)
>>> mybrowser.proxies
['127.0.0.1:35675', '127.0.0.1:35676', '127.0.0.1:35677', '127.0.0.1:45678']
>>> mybrowser.open('http://localhost:45678')
>>> mybrowser.proxies
['127.0.0.1:45678']
>>> mybrowser.proxies = ['127.0.0.1:34785']
>>> mybrowser.open('http://localhost:45678')
Traceback (most recent call last):
...
Exception: There are no valid proxies left

The loop is recursion protected. If we return always the same host, so the chooser cannot choose anything else. It will loop until it crashes or it handle the recursion:

>>> def randomint(a,b):
...     return 2
>>> import random; random.randint = randomint
>>> b2 = Browser(config=config)
>>> b2.proxy_max_use
3
>>> b2._lastproxy['count']
0
>>> b2.chooseProxy()
'...
>>> b2._lastproxy['count']
1
>>> b2.chooseProxy()
'...
>>> b2._lastproxy['count']
2
>>> b2.chooseProxy()
'...
>>> b2._lastproxy['count']
3
>>> b2.chooseProxy()
'...
>>> b2.chooseProxy()
Ho, seems we got the max wills to choose, something has gone wrong
'127.0.0.1:35675'

HISTORY

0.3

  • adding error message

0.2

  • Adding proxy fallback facility

0.1

  • Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collective.anonymousbrowser-0.2dev-r73410.zip (14.6 kB view details)

Uploaded Source

File details

Details for the file collective.anonymousbrowser-0.2dev-r73410.zip.

File metadata

File hashes

Hashes for collective.anonymousbrowser-0.2dev-r73410.zip
Algorithm Hash digest
SHA256 9e7a577afe98222303012712457e007abaae18b25975955dc6425c318381fbcb
MD5 434db4d03a17679c8ef6ae57ec2c4357
BLAKE2b-256 fa14485c5bc35c90de3e9b7e4b4d50fe67abcce274a84e356bf77a775cdae3ed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page