HTTPClient
Class URI

java.lang.Object
  |
  +--HTTPClient.URI

public class URI
extends Object

This class represents a generic URI, as defined in RFC-2396. This is similar to java.net.URL, with the following enhancements:

The elements are always stored in escaped form.

While RFC-2396 distinguishes between just two forms of URI's, those that follow the generic syntax and those that don't, this class knows about a third form, named semi-generic, used by quite a few popular schemes. Semi-generic syntax treats the path part as opaque, i.e. has the form <scheme>://<authority>/<opaque> . Relative URI's of this type are only resolved as far as absolute paths - relative paths do not exist.

Ideally, java.net.URL should subclass URI.

Since:
V0.3-1
Version:
0.3-3 06/05/2001
Author:
Ronald Tschalär
See Also:
rfc-2396

Field Summary
protected static BitSet alphanumChar
           
protected static Hashtable defaultPorts
           
static boolean ENABLE_BACKWARDS_COMPATIBILITY
          If true, then the parser will resolve certain URI's in backwards compatible (but technically incorrect) manner.
static BitSet escpdFragChar
          list of characters which must not be escaped when escaping a fragment identifier
static BitSet escpdPathChar
          list of characters which must not be escaped when escaping a path
static BitSet escpdQueryChar
          list of characters which must not be escaped when escaping a query string
protected  String fragment
           
protected static int GENERIC
           
protected  String host
           
protected static BitSet hostChar
           
protected static BitSet markChar
           
protected  String opaque
           
protected static int OPAQUE
           
protected static BitSet opaqueChar
           
protected  String path
           
protected static BitSet pcharChar
           
protected  int port
           
protected  String query
           
protected static BitSet reg_nameChar
           
protected static BitSet reservedChar
           
static BitSet resvdHostChar
          list of characters which must not be unescaped when unescaping a host
static BitSet resvdPathChar
          list of characters which must not be unescaped when unescaping a path
static BitSet resvdQueryChar
          list of characters which must not be unescaped when unescaping a query string
static BitSet resvdSchemeChar
          list of characters which must not be unescaped when unescaping a scheme
static BitSet resvdUIChar
          list of characters which must not be unescaped when unescaping a userinfo
protected  String scheme
           
protected static BitSet schemeChar
           
protected static int SEMI_GENERIC
           
protected  int type
           
protected static BitSet unreservedChar
           
protected static BitSet uricChar
           
protected  URL url
           
protected  String userinfo
           
protected static BitSet userinfoChar
           
protected static Hashtable usesGenericSyntax
           
protected static Hashtable usesSemiGenericSyntax
           
 
Constructor Summary
URI(String uri)
          Constructs a URI from the given string representation.
URI(String scheme, String opaque)
          Constructs an opaque URI from the given parts.
URI(String scheme, String host, int port, String path)
          Constructs a URI from the given parts.
URI(String scheme, String host, String path)
          Constructs a URI from the given parts, using the default port for this scheme (if known).
URI(String scheme, String userinfo, String host, int port, String path, String query, String fragment)
          Constructs a URI from the given parts.
URI(URI base, String rel_uri)
          Constructs a URI from the given string representation, relative to the given base URI.
URI(URL url)
          Construct a URI from the given URL.
 
Method Summary
static String canonicalizePath(String path)
          Remove all "/../" and "/./" from path, where possible.
static int defaultPort(String protocol)
          Return the default port used by a given protocol.
 boolean equals(Object other)
           
static char[] escape(char[] elem, BitSet allowed_char, boolean utf8)
          Escape any character not in the given character class.
static String escape(String elem, BitSet allowed_char, boolean utf8)
          Escape any character not in the given character class.
 String getFragment()
           
 String getHost()
           
 String getOpaque()
           
 String getPath()
           
 String getPathAndQuery()
           
 int getPort()
           
 String getQueryString()
           
 String getScheme()
           
 String getUserinfo()
           
 int hashCode()
          The hash code is calculated over scheme, host, path, and query.
 boolean isGenericURI()
          Does the scheme specific part of this URI use the generic-URI syntax?
 boolean isSemiGenericURI()
          Does the scheme specific part of this URI use the semi-generic-URI syntax?
static void main(String[] args)
          Run test set.
 String toExternalForm()
           
 String toString()
          Return the URI as string.
 URL toURL()
          Will try to create a java.net.URL object from this URI.
static String unescape(String str, BitSet reserved)
          Unescape escaped characters (i.e.
static boolean usesGenericSyntax(String scheme)
           
static boolean usesSemiGenericSyntax(String scheme)
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ENABLE_BACKWARDS_COMPATIBILITY

public static final boolean ENABLE_BACKWARDS_COMPATIBILITY
If true, then the parser will resolve certain URI's in backwards compatible (but technically incorrect) manner. Example:
 base   = http://a/b/c/d;p?q
 rel    = http:g
 result = http:g		(correct)
 result = http://a/b/c/g	(backwards compatible)
See rfc-2396, section 5.2, step 3, second paragraph.

defaultPorts

protected static final Hashtable defaultPorts

usesGenericSyntax

protected static final Hashtable usesGenericSyntax

usesSemiGenericSyntax

protected static final Hashtable usesSemiGenericSyntax

alphanumChar

protected static final BitSet alphanumChar

markChar

protected static final BitSet markChar

reservedChar

protected static final BitSet reservedChar

unreservedChar

protected static final BitSet unreservedChar

uricChar

protected static final BitSet uricChar

pcharChar

protected static final BitSet pcharChar

userinfoChar

protected static final BitSet userinfoChar

schemeChar

protected static final BitSet schemeChar

hostChar

protected static final BitSet hostChar

opaqueChar

protected static final BitSet opaqueChar

reg_nameChar

protected static final BitSet reg_nameChar

resvdSchemeChar

public static final BitSet resvdSchemeChar
list of characters which must not be unescaped when unescaping a scheme

resvdUIChar

public static final BitSet resvdUIChar
list of characters which must not be unescaped when unescaping a userinfo

resvdHostChar

public static final BitSet resvdHostChar
list of characters which must not be unescaped when unescaping a host

resvdPathChar

public static final BitSet resvdPathChar
list of characters which must not be unescaped when unescaping a path

resvdQueryChar

public static final BitSet resvdQueryChar
list of characters which must not be unescaped when unescaping a query string

escpdPathChar

public static final BitSet escpdPathChar
list of characters which must not be escaped when escaping a path

escpdQueryChar

public static final BitSet escpdQueryChar
list of characters which must not be escaped when escaping a query string

escpdFragChar

public static final BitSet escpdFragChar
list of characters which must not be escaped when escaping a fragment identifier

OPAQUE

protected static final int OPAQUE

SEMI_GENERIC

protected static final int SEMI_GENERIC

GENERIC

protected static final int GENERIC

type

protected int type

scheme

protected String scheme

opaque

protected String opaque

userinfo

protected String userinfo

host

protected String host

port

protected int port

path

protected String path

query

protected String query

fragment

protected String fragment

url

protected URL url
Constructor Detail

URI

public URI(String uri)
    throws ParseException
Constructs a URI from the given string representation. The string must be an absolute URI.
Parameters:
uri - a String containing an absolute URI
Throws:
ParseException - if no scheme can be found or a specified port cannot be parsed as a number

URI

public URI(URI base,
           String rel_uri)
    throws ParseException
Constructs a URI from the given string representation, relative to the given base URI.
Parameters:
base - the base URI, relative to which rel_uri is to be parsed
rel_uri - a String containing a relative or absolute URI
Throws:
ParseException - if base is null and rel_uri is not an absolute URI, or if base is not null and the scheme is not known to use the generic syntax, or if a given port cannot be parsed as a number

URI

public URI(URL url)
    throws ParseException
Construct a URI from the given URL.
Parameters:
url - the URL
Throws:
ParseException - if url.toExternalForm() generates an invalid string representation

URI

public URI(String scheme,
           String host,
           String path)
    throws ParseException
Constructs a URI from the given parts, using the default port for this scheme (if known). The parts must be in unescaped form.
Parameters:
scheme - the scheme (sometimes known as protocol)
host - the host
path - the path part
Throws:
ParseException - if scheme is null

URI

public URI(String scheme,
           String host,
           int port,
           String path)
    throws ParseException
Constructs a URI from the given parts. The parts must be in unescaped form.
Parameters:
scheme - the scheme (sometimes known as protocol)
host - the host
port - the port
path - the path part
Throws:
ParseException - if scheme is null

URI

public URI(String scheme,
           String userinfo,
           String host,
           int port,
           String path,
           String query,
           String fragment)
    throws ParseException
Constructs a URI from the given parts. Any part except for the the scheme may be null. The parts must be in unescaped form.
Parameters:
scheme - the scheme (sometimes known as protocol)
userinfo - the userinfo
host - the host
port - the port
path - the path part
query - the query string
fragment - the fragment identifier
Throws:
ParseException - if scheme is null

URI

public URI(String scheme,
           String opaque)
    throws ParseException
Constructs an opaque URI from the given parts.
Parameters:
scheme - the scheme (sometimes known as protocol)
opaque - the opaque part
Throws:
ParseException - if scheme is null
Method Detail

canonicalizePath

public static String canonicalizePath(String path)
Remove all "/../" and "/./" from path, where possible. Leading "/../"'s are not removed.
Parameters:
path - the path to canonicalize
Returns:
the canonicalized path

usesGenericSyntax

public static boolean usesGenericSyntax(String scheme)
Returns:
true if the scheme should be parsed according to the generic-URI syntax

usesSemiGenericSyntax

public static boolean usesSemiGenericSyntax(String scheme)
Returns:
true if the scheme should be parsed according to a semi-generic-URI syntax <scheme&tgt;://<hostport>/<opaque>

defaultPort

public static final int defaultPort(String protocol)
Return the default port used by a given protocol.
Parameters:
protocol - the protocol
Returns:
the port number, or 0 if unknown

getScheme

public String getScheme()
Returns:
the scheme (often also referred to as protocol)

getOpaque

public String getOpaque()
Returns:
the opaque part, or null if this URI is generic

getHost

public String getHost()
Returns:
the host

getPort

public int getPort()
Returns:
the port, or -1 if it's the default port, or 0 if unknown

getUserinfo

public String getUserinfo()
Returns:
the user info

getPath

public String getPath()
Returns:
the path

getQueryString

public String getQueryString()
Returns:
the query string

getPathAndQuery

public String getPathAndQuery()
Returns:
the path and query

getFragment

public String getFragment()
Returns:
the fragment

isGenericURI

public boolean isGenericURI()
Does the scheme specific part of this URI use the generic-URI syntax?

In general URI are split into two categories: opaque-URI and generic-URI. The generic-URI syntax is the syntax most are familiar with from URLs such as ftp- and http-URLs, which is roughly:

 generic-URI = scheme ":" [ "//" server ] [ "/" ] [ path_segments ] [ "?" query ]
 
(see RFC-2396 for exact syntax). Only URLs using the generic-URI syntax can be used to create and resolve relative URIs.

Whether a given scheme is parsed according to the generic-URI syntax or wether it is treated as opaque is determined by an internal table of URI schemes.

See Also:
rfc-2396

isSemiGenericURI

public boolean isSemiGenericURI()
Does the scheme specific part of this URI use the semi-generic-URI syntax?

Many schemes which don't follow the full generic syntax actually follow a reduced form where the path part is treated is opaque. This is used for example by ldap, smtp, pop, etc, and is roughly

 generic-URI = scheme ":" [ "//" server ] [ "/" [ opaque_path ] ]
 
I.e. parsing is identical to the generic-syntax, except that the path part is not further parsed. URLs using the semi-generic-URI syntax can be used to create and resolve relative URIs with the restriction that all paths are treated as absolute.

Whether a given scheme is parsed according to the semi-generic-URI syntax is determined by an internal table of URI schemes.

See Also:
isGenericURI()

toURL

public URL toURL()
          throws MalformedURLException
Will try to create a java.net.URL object from this URI.
Returns:
the URL
Throws:
MalformedURLException - if no handler is available for the scheme

toExternalForm

public String toExternalForm()
Returns:
a string representation of this URI suitable for use in links, headers, etc.

toString

public String toString()
Return the URI as string. This differs from toExternalForm() in that all elements are unescaped before assembly. This is not suitable for passing to other apps or in header fields and such, and is usually not what you want.
Overrides:
toString in class Object
Returns:
the URI as a string
See Also:
toExternalForm()

equals

public boolean equals(Object other)
Overrides:
equals in class Object
Returns:
true if other is either a URI or URL and it matches the current URI

hashCode

public int hashCode()
The hash code is calculated over scheme, host, path, and query.
Overrides:
hashCode in class Object
Returns:
the hash code

escape

public static String escape(String elem,
                            BitSet allowed_char,
                            boolean utf8)
Escape any character not in the given character class. Characters greater 255 are always escaped according to ??? .
Parameters:
elem - the string to escape
allowed_char - the BitSet of all allowed characters
utf8 - if true, will first UTF-8 encode unallowed characters
Returns:
the string with all characters not in allowed_char escaped

escape

public static char[] escape(char[] elem,
                            BitSet allowed_char,
                            boolean utf8)
Escape any character not in the given character class. Characters greater 255 are always escaped according to ??? .
Parameters:
elem - the array of characters to escape
allowed_char - the BitSet of all allowed characters
utf8 - if true, will first UTF-8 encode unallowed characters
Returns:
the elem array with all characters not in allowed_char escaped

unescape

public static final String unescape(String str,
                                    BitSet reserved)
                             throws ParseException
Unescape escaped characters (i.e. %xx) except reserved ones.
Parameters:
str - the string to unescape
reserved - the characters which may not be unescaped, or null
Returns:
the unescaped string
Throws:
ParseException - if the two digits following a `%' are not a valid hex number

main

public static void main(String[] args)
                 throws Exception
Run test set.
Throws:
Exception - if any test fails