0
I Use This!
Activity Not Available

Project Summary

#!/usr/bin/perl -- # # CGIProxy 2.0.1 # # nph-proxy.cgi-- CGIProxy 2.0.1: a proxy in the form of a CGI script. # Retrieves the resource at any HTTP or FTP URL, updating embedded URLs # in HTML resources to point back through this script. By default, no # user info is sent to the server. Options include text-only proxying # to save bandwidth, cookie filtering, ad filtering, script removal, # user-defined encoding of the target URL, and more. Requires Perl 5. # # Copyright (C) 1996, 1998-2002 by James Marshall, james@jmarshall.com # All rights reserved. # # For the latest, see http://www.jmarshall.com/tools/cgiproxy/ # # # IMPORTANT NOTE ABOUT ANONYMOUS BROWSING: # CGIProxy was originally made for indirect browsing more than # anonymity, but since people are using it for anonymity, I've tried # to make it as anonymous as possible. Suggestions welcome. For best # anonymity, browse with JavaScript turned off. In fact, that's the # only reliable way, in spite of what certain anonymity vendors claim. # Anonymity is pretty good, but may not be bulletproof. For example, # if even a single JavaScript statement can be run, your anonymity can # be compromised. I've tried to remove JS from every place it can # exist, but please tell me if I missed any. Also, browser plugins or # other executable extensions may be able to reveal you to a server. # If you find any way your anonymity can be compromised even with scripts # turned off, please let me know. # # # CONFIGURATION: # # None required in most situations. On some servers, these might be # required (all in the "user configuration" section): # . If you're using another HTTP or SSL proxy, set $HTTP_PROXY, # $SSL_PROXY, and $NO_PROXY as needed. If those proxies use # authentication, set $PROXY_AUTH and $SSL_PROXY_AUTH accordingly. # . If this is running on an SSL server that doesn't use port 443, set # $RUNNING_ON_SSL_SERVER=1 (otherwise, the default of '' is fine). # # Options include: # . Set $TEXT_ONLY, $REMOVE_COOKIES, $REMOVE_SCRIPTS, $FILTER_ADS, # $HIDE_REFERER, and $INSERT_ENTRY_FORM as desired. Set # $REMOVE_SCRIPTS if anonymity is important. # . To let the user choose all of those settings (except $TEXT_ONLY), # set $ALLOW_USER_CONFIG=1. # . To change the encoding format of the URL, modify the # proxy_encode() and proxy_decode() routines. The default # routines are suitable for simple PATH_INFO compliance. # . To encode cookies, modify the cookie_encode() and cookie_decode() # routines. # . You can restrict which servers this proxy will access, with # @ALLOWED_SERVERS and @BANNED_SERVERS. # . Similarly, you can specify allowed and denied server lists for # both cookies and scripts. # . For security, you can ban access to private IP ranges, with # @BANNED_NETWORKS. # . If filtering ads, you can customize this with a few settings. # . To insert your own block of HTML into each page, set $INSERT_HTML # or $INSERT_FILE. # . As a last resort, if you really can't run this script as NPH, # you can try to run it as non-NPH by setting $NOT_RUNNING_AS_NPH=1. # BUT, read the notes and warnings above that line. Caveat surfor. # . For crude load-balancing among a set of proxies, set @PROXY_GROUP. # . Other config is possible; see the user configuration section. # . If heavy use of this proxy puts a load on your server, see the # "NOTES ON PERFORMANCE" section below. # # For more info, read the comments regarding any config options you set. # # This script MUST be installed as a non-parsed header (NPH) script. # In Apache and many other servers, this is done by simply starting the # filename with "nph-". It MAY be possible to fake it as a non-NPH # script, MOST of the time, by using the $NOT_RUNNING_AS_NPH feature. # This is not advised. See the comments by that option for warnings. # # # TO USE: # Start a browsing session by calling the script with no parameters. # You can bookmark pages you browse to through the proxy, or link to # the URLs that are generated. # # # NOTES ON PERFORMANCE: # Unfortunately, this has gotten slower through the versions, mostly # because of optional new features. Configured equally, version 1.3 # takes 25% longer to run than 1.0 or 1.1 (based on cough highly # abbreviated testing). Compiling takes about 50% longer. # Leaving $REMOVE_SCRIPTS=1 adds 25-50% to the running time. # Remember that we're talking about tenths of a second here. Most of # the delay experienced by the user is from waiting on two network # connections. These performance issues only matter if your server # CPU is getting overloaded. Also, these only matter when retrieving # HTML, because it's the HTML modification that takes all the time. # If you can, use mod_perl. Starting with version 1.3.1, this should # work under mod_perl, which requires Perl 5.004 or later. If you use # mod_perl, be careful to install this as an NPH script, i.e. set the # "PerlSendHeader Off" configuration directive. For more info, see the # mod_perl documentation. # If you use mod_perl and modify this script, see the note near the # "reset 'a-z'" line below, regarding UPPER_CASE and lower_case # variables. # # # TO DO: # What I want to hear about: # . Any HTML tags not being converted here. # . Any method of introducing JavaScript or other script, that's not # being filtered out here. # . Any script MIME types other than those already in @SCRIPT_MIME_TYPES. # . Any MIME types other than text/html that have links that need to # be converted. # # plug any other script holes (e.g. MSIE-proprietary, other MIME types?) # This could use cleaner URL-encoding all over ($base_url, etc.) # more error checking? # find a simple encryption technique for proxy_encode() # support more protocols, like mailto: or gopher: # For ad filtering, add option to disable images from servers other than # that of the containing HTML page? Is it worth it? # # # BUGS: # Anonymity may not not perfect. In particular, there may be some remaining # JavaScript holes. # URLs generated by JavaScript or similar mechanisms won't be re-proxy'ed # correctly. JavaScript in general may not work as expected. # Since ALL of your cookies are sent to this script (which then chooses # the relevant ones), some cookies could conceivably be dropped if # you accumulate a whole lot. I haven't seen this happen yet. # # # I first wrote this in 1996 as an experiment to allow indirect browsing. # The original seed was a program I wrote for Rich Morin's article # in the June 1996 issue of Unix Review, online at # http://www.cfcl.com/tin/P/199606.shtml. # # Confession: I didn't originally write this with the spec for HTTP # proxies in mind, and there are probably some violations of the protocol # (at least for proxies). This whole thing is one big violation of the # proxy model anyway, so I hereby rationalize that the spec can be widely # interpreted here. If there is demand, I can make it more conformant. # The HTTP client and server components should be fine; it's just the # special requirements for proxies that may not be followed. # #--------------------------------------------------------------------------

use strict ; use Socket ;

# First block below is config variables, second block is sort-of config # variables, third block is persistent constants, fourth block is would-be # persistent constants (not set until needed), and last block is variables. use vars qw(

$TEXT_ONLY
$REMOVE_COOKIES $REMOVE_SCRIPTS $FILTER_ADS $HIDE_REFERER
$INSERT_ENTRY_FORM $ALLOW_USER_CONFIG
@ALLOWED_SERVERS @BANNED_SERVERS @BANNED_NETWORKS
$NO_COOKIE_WITH_IMAGE @ALLOWED_COOKIE_SERVERS @BANNED_COOKIE_SERVERS
@ALLOWED_SCRIPT_SERVERS @BANNED_SCRIPT_SERVERS
@BANNED_IMAGE_URL_PATTERNS $RETURN_EMPTY_GIF
$INSERT_HTML $INSERT_FILE $ANONYMIZE_INSERTION $FORM_AFTER_INSERTION
$INSERTION_FRAME_HEIGHT
$RUNNING_ON_SSL_SERVER $NOT_RUNNING_AS_NPH
$HTTP_PROXY $SSL_PROXY $NO_PROXY $PROXY_AUTH $SSL_PROXY_AUTH
$MINIMIZE_CACHING $SESSION_COOKIES_ONLY
@PROXY_GROUP
$USER_AGENT $USE_PASSIVE_FTP_MODE $SHOW_FTP_WELCOME $USE_POST_ON_START
$REMOVE_TITLES $NO_BROWSE_THROUGH_SELF $NO_LINK_TO_START $MAX_REQUEST_SIZE
$QUIETLY_EXIT_PROXY_SESSION
$OVERRIDE_SECURITY

$PROXIFY_SCRIPTS $PROXIFY_COMMENTS
@SCRIPT_MIME_TYPES @OTHER_TYPES_TO_REGISTER @TYPES_TO_HANDLE
$NON_TEXT_EXTENSIONS
$PROXY_VERSION
@MONTH @WEEKDAY %UN_MONTH
@BANNED_NETWORK_ADDRS
$RUNNING_ON_IIS
@NO_PROXY
$NO_CACHE_HEADERS
@ALL_TYPES %MIME_TYPE_ID $SCRIPT_TYPE_REGEX $TYPES_TO_HANDLE_REGEX
$THIS_HOST $ENV_SERVER_PORT $ENV_SCRIPT_NAME $THIS_SCRIPT_URL
$HAS_BEGUN
$CUSTOM_INSERTION
$HTTP_VERSION $HTTP_1_X
$URL
$now
$packed_flags $encoded_URL $doing_insert_here $env_accept
$e_remove_cookies $e_remove_scripts $e_filter_ads $e_insert_entry_form
$e_hide_referer
$images_are_banned_here $scripts_are_banned_here $cookies_are_banned_here
$scheme $authority $path $host $port $username $password
$cookie_to_server %auth
$script_url $url_start $url_start_inframe $url_start_noframe
$is_in_frame $expected_type
$base_url $base_scheme $base_host $base_path $base_unframes
$default_style_type $default_script_type
$status $headers $body $is_html $response_sent
$debug ) ;
# Under mod_perl, persistent constants only need to be initialized once, so # use this one-time block to do so. unless ($HAS_BEGUN) {

#-------------------------------------------------------------------------- # user configuration #--------------------------------------------------------------------------

# If set, then proxy traffic will be restricted to text data only, to save # bandwidth (though it can still be circumvented with uuencode, etc.).

$TEXT_ONLY= 0 ; # set to 1 to allow only text data, 0 to allow all

# If set, then prevent all cookies from passing through the proxy. To allow # cookies from some servers, set this to 0 and see @ALLOWED_COOKIE_SERVERS # and @BANNED_COOKIE_SERVERS below. You can also prevent cookies with # images by setting $NO_COOKIE_WITH_IMAGE below. # Note that this only affects cookies from the target server. The proxy # script sends its own cookies for other reasons too, like to support # authentication. This flag does not stop these cookies from being sent.

$REMOVE_COOKIES= 0 ;

# If set, then remove as much scripting as possible. If anonymity is # important, this is strongly recommended! Better yet, turn off script # support in your browser. # On the HTTP level: # . prevent transmission of script MIME types (which only works if the server # marks them as such, so a malicious server could get around this, but # then the browser probably wouldn't execute the script). # . remove Link: headers that link to a resource of a script MIME type. # Within HTML resources: # . remove

... . # . remove intrinsic event attributes from tags, i.e. attributes whose names # begin with "on". # . remove ... where "type" attribute is a script MIME type. # . remove various HTML tags that appear to link to a script MIME type. # . remove script macros (aka Netscape-specific "JavaScript entities"), # i.e. any attributes containing the string "&{" . # . remove "JavaScript conditional comments". # . remove MSIE-specific "dynamic properties". # To allow scripts from some sites but not from others, set this to 0 and # see @ALLOWED_SCRIPT_SERVERS and @BANNED_SCRIPT_SERVERS below. # See @SCRIPT_MIME_TYPES below for a list of which MIME types are filtered out. # I do NOT know for certain that this removes all script content! It removes # all that I know of, but I don't have a definitive list of places scripts # can exist. If you do, please send it to me. EVEN RUNNING A SINGLE # JAVASCRIPT STATEMENT CAN COMPROMISE YOUR ANONYMITY! Just so you know. # Richard Smith has a good test site for anonymizing proxies, at # http://users.rcn.com/rms2000/anon/test.htm # Note that turning this on removes most popup ads! :)

$REMOVE_SCRIPTS= 1 ;

# If set, then filter out images that match one of @BANNED_IMAGE_URL_PATTERNS, # below. Also removes cookies attached to images, as if $NO_COOKIE_WITH_IMAGE # is set. # To remove most popup advertisements, also set $REMOVE_SCRIPTS=1 above.

$FILTER_ADS= 0 ;

# If set, then don't send a Referer: sic header with each request # (i.e. something that tells the server which page you're coming from # that linked to it). This is a minor privacy issue, but a few sites # won't send you pages or images if the Referer: is not what they're # expecting. If a page is loading without images or a link seems to be # refused, then try turning this off, and a correct Referer: header will # be sent. # This is only a problem in a VERY small percentage of sites, so few that # I'm kinda hesitant to put this in the entry form. Other arrangements # have their own problems, though.

$HIDE_REFERER= 1 ;

# If set, insert a compact version of the URL entry form at the top of each # page. This will also display the URL currently being viewed. # When viewing a page with frames, then a new top frame is created and the # insertion goes there. # If you want to customize the appearance of the form, modify the routine # mini_start_form() near the end of the script. # If you want to insert something other than this form, see $INSERT_HTML and # $INSERT_FILE below. # Users should realize that options changed via the form only take affect when # the form is submitted by entering a new URL or pressing the "Go" button. # Selecting an option, then following a link on the page, will not cause # the option to take effect. # Users should also realize that anything inserted into a page may throw # off any precise layout. The insertion will also be subject to # background colors and images, and any other page-wide settings.

$INSERT_ENTRY_FORM= 1 ;

# If set, then allow the user to control $REMOVE_COOKIES, $REMOVE_SCRIPTS, # $FILTER_ADS, $HIDE_REFERER, and $INSERT_ENTRY_FORM. Note that they # can't fine-tune any related options, such as the various @ALLOWED... and # @BANNED... lists.

$ALLOW_USER_CONFIG= 1 ;

# Create your own proxy_encode() and proxy_decode() to tranform the target # URL to and from the format that will be stored in PATH_INFO. The encoded # form should only contain characters that are legal in PATH_INFO. This # varies by server, but using only printable chars, no "?" or "#", and no # two adjacent slashes ("//") works on most servers. Don't let PATH_INFO # contain the strings "./", "/.", "../", or "/..", or else it may get # compressed like a pathname somewhere. Try not to make the resulting # string too long, either. # Of course, proxy_decode() must exactly undo whatever proxy_encode() does. # Make proxy_encode() as fast as possible-- it's a major bottleneck for the # whole program. # Because of the simplified absolute URL resolution in full_url(), there may # be ".." segments in the default encoding here, notably in the first path # segment. Normally, that's just an HTML mistake, but please tell me if # you see any privacy exploit with it. # Note that a few sites have embedded applications (like applets or Shockwave) # that expect to access URLs relative to the page's URL. This means they # may not work if the encoded target URL can't be treated like a base URL, # e.g. that it can't be appended with something like "../data/foo.data" # to get that expected data file. In such cases, the default encoding below # should let these sites work fine, as should any other encoding that can # support URLs relative to it.

sub proxy_encode {

my($URL)= @ ;
$URL=~ s#^(\w+.-+)://#$1/# ; # http://xxx -> http/xxx
# $URL=~ s/(.)/ sprintf('%02x',ord($1)) /ge ; # each char -> 2-hex # $URL=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13 return $URL ;
}

sub proxy_decode {

my($enc_URL)= @ ;
# $enc_URL=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13 # $enc_URL=~ s/(0-9A-Fa-f{2})/ sprintf("%c",hex($1)) /ge ; $enc_URL=~ s#^(\w+.-+)/#$1://# ; # http/xxx -> http://xxx
return $enc_URL ;
}

# Encode cookies before they're sent back to the user. # The return value must only contain characters that are legal in cookie # names and values, i.e. only printable characters, and no ";", ",", "=", # or white space. # cookie_encode() is called twice for each cookie: once to encode the cookie # name, and once to encode the cookie value. The two are then joined with # "=" and sent to the user. # cookie_decode() must exactly undo whatever cookie_encode() does. # Also, cookie_encode() must always encode a given input string into the # same output string. This is because browsers need the cookie name to # identify and manage a cookie, so the name must be consistent. # This is not a bottleneck like proxy_encode() is, so speed is not critical.

sub cookie_encode {

my($cookie)= @ ;
# $cookie=~ s/(.)/ sprintf('%02x',ord($1)) /ge ; # each char -> 2-hex # $cookie=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13 $cookie=~ s/(\W)/ '%' . sprintf('%02x',ord($1)) /ge ; # simple URL-encoding
return $cookie ;
}

sub cookie_decode {

my($enc_cookie)= @ ;
$enc_cookie=~ s/%(\da-fA-F{2})/ pack('C', hex($1)) /ge ; # URL-decode
# $enc_cookie=~ tr/a-zA-Z/n-za-mN-ZA-M/ ; # rot-13 # $enc_cookie=~ s/(0-9A-Fa-f{2})/ sprintf("%c",hex($1)) /ge ; return $enc_cookie ;
}

# Use @ALLOWED_SERVERS and @BANNED_SERVERS to restrict which servers a user # can visit through this proxy. Any URL at a host matching a pattern in # @BANNED_SERVERS will be forbidden. In addition, if @ALLOWED_SERVERS is # not empty, then access is allowed only to servers that match a pattern # in it. In other words, @BANNED_SERVERS means "ban these servers", and # @ALLOWED_SERVERS (if not empty) means "allow only these servers". If a # server matches both lists, it is banned. # These are each a list of Perl 5 regular expressions (aka patterns or # regexes), not literal host names. To turn a hostname into a pattern, # replace every "." with "\.", add "^" to the beginning, and add "$" to the # end. For example, "www.example.com" becomes "^www\.example\.com$". To # match every host ending in something, leave out the "^". For example, # "\.example\.com$" matches every host ending in ".example.com". For more # details about Perl regular expressions, see the Perl documentation. (They # may seem cryptic at first, but they're very powerful once you know how to # use them.) @ALLOWED_SERVERS= () ; @BANNED_SERVERS= () ;

# If @BANNED_NETWORKS is set, then forbid access to these hosts or networks. # This is done by IP address, not name, so it provides more certain security # than @BANNED_SERVERS above. # Specify each element as a decimal IP address-- all four integers for a host, # or one to three integers for a network. For example, '127.0.0.1' bans # access to the local host, and '192.168' bans access to all IP addresses # in the 192.168 network. Sorry, no banning yet for subnets other than # 8, 16, or 24 bits. # IF YOU'RE RUNNING THIS ON OR INSIDE A FIREWALL, THIS SETTING IS STRONGLY # RECOMMENDED!! In particular, you should ban access to other machines # inside the firewall that the firewall machine itself may have access to. # Otherwise, external users will be able to access any internal hosts that # the firewall can access. Even if that's what you intend, you should ban # access to any hosts that you don't explicitly want to expose to outside # users. # In addition to the recommended defaults below, add all IP addresses of your # server machine if you want to protect it like this. # After you set this, YOU SHOULD TEST to verify that the proxy can't access # the IP addresses you're banning! # This feature is simple now but will be more complete in future releases. # How would you like this to be extended? What would be useful to you? @BANNED_NETWORKS= ('127.0.0.1', '192.168', '10') ;

# Settings to fine-tune cookie filtering, if cookies are not banned altogether # (by user checkbox or $REMOVE_COOKIES above). # Use @ALLOWED_COOKIE_SERVERS and @BANNED_COOKIE_SERVERS to restrict which # servers can send cookies through this proxy. They work like # @ALLOWED_SERVERS and @BANNED_SERVERS above, both in how their precedence # works, and that they're lists of Perl 5 regular expressions. See the # comments there for details.

# If non-empty, only allow cookies from servers matching one of these patterns. # Comment this out to allow all cookies (subject to @BANNED_COOKIE_SERVERS). #@ALLOWED_COOKIE_SERVERS= ('\bslashdot\.org$') ;

Tags

No tags have been added

In a Nutshell, proxy6151...

 No code available to analyze

Open Hub computes statistics on FOSS projects by examining source code and commit history in source code management systems. This project has no code locations, and so Open Hub cannot perform this analysis

Is this project's source code hosted in a publicly available repository? Do you know the URL? If you do, click the button below and tell us so that Open Hub can generate statistics! It's fast and easy - try it and see!

Add a code location

Apache License 2.0
Permitted

Commercial Use

Modify

Distribute

Place Warranty

Sub-License

Private Use

Use Patent Claims

Forbidden

Hold Liable

Use Trademarks

Required

Include Copyright

State Changes

Include License

Include Notice

These details are provided for information only. No information here is legal advice and should not be used as such.

All Licenses

This Project has No vulnerabilities Reported Against it

Did You Know...

  • ...
    Black Duck offers a free trial so you can discover if there are open source vulnerabilities in your code
  • ...
    you can subscribe to e-mail newsletters to receive update from the Open Hub blog
  • ...
    65% of companies leverage OSS to speed application development in 2016
  • ...
    learn about Open Hub updates and features on the Open Hub blog

 No code available to analyze

Open Hub computes statistics on FOSS projects by examining source code and commit history in source code management systems. This project has no code locations, and so Open Hub cannot perform this analysis

Is this project's source code hosted in a publicly available repository? Do you know the URL? If you do, click the button below and tell us so that Open Hub can generate statistics! It's fast and easy - try it and see!

Add a code location

Community Rating

Be the first to rate this project
Click to add your rating
   Spinner
Review this Project!
Sample ohloh analysis