Emulating HTML Form Submissions using the HTTPClient

Many folks are attempting to use the HTTPClient to mimick a browser submitting form data. While the transformation from html to code is really quite straightforward, it assumes a minimal knowledge of html. So, the best place to start is get an understanding of html - the specs are available from w3.org:

The HTML-4.01 section on forms gives a good introduction.

In order to make this easier however, this doc attempts to explain more specifically how to create the necessary code using the HTTPClient in order duplicate what a browser would send when you submit a given form. What follows here is not meant to be exhaustive - if you see something in your form that is not explained here, then please refer to the html specs.

For the rest of this document, I'm going to assume you've got a page under the URL http://www.some.org/some/form.html which contains an html form whose submission you want to emulate in java code. Html tag and attribute names are case insensitive, so while I'll always use uppercase for these (i.e. FORM or NAME=), they may appear in a different case in the actual html you're aiming at emulating (but note that the values are not case insensitive, i.e. if you see a NAME=UsEr then you must send the string "UsEr" using that exact case).

Finding and Interpreting the FORM tag

The first thing to do is to find the beginning and the end of the form you're interested in. The beginning is marked with a <FORM ...>, and the end with a </FORM>. There may be multiple forms in the document, so make sure you pick the one you're actually interested in (this can be achieved either by counting the forms and picking the n-th one, or by looking at which input fields the form contains and making sure it contains the ones you're interested in).

The start tag has the general form

<FORM ACTION="some/url" METHOD="method" ENCTYPE="enc-type">
There may be additional attributes (but they're not of interest here), and the METHOD and/or ENCTYPE may be missing.

The ACTION attribute specifies the URL to which to send the form data; it is taken relative to the document's URL. If you are unsure how to contruct the resulting URL, you can use the URI class to do so:

	URI doc_url  = new URI("http://www.some.org/some/form.html");
	URI form_url = new URI(doc_url, "some/url");

The METHOD attribute describes the method to use; it can take the values GET or POST. If it is missing, GET is assumed. As the name implies, this is the method you must use when posting the form data.

The ENCTYPE attribute specifies the content-type (and therefore the encoding) for the data to be sent. If not present, the default is "application/x-www-form-urlencoded", which is what will be used if you create an array of NVPair's and pass them to the Get or Post method of HTTPConnection. The other popular value is "multipart/form-data" - if you see that, you can use the Codecs.mpFormDataEncode() to correctly encode the form data.

The INPUT tags

All the actual data to be entered and submitted is specified by INPUT tags. These have the form

<INPUT TYPE="type" NAME="name" VALUE="value">
Again, there may be other attributes, but in general they're not of interest here; the VALUE attribute is often missing. For each INPUT tag you need to decide whether to create a corresponding NVPair, and if so, what name and value to put into it. The name for the NVPair is always what's given in the NAME attribute; the value is either what you enter in that field or what is in the VALUE attribute.

The type attribute specifies the what kind of element is displayed, e.g. an input field, a button, or nothing at all; if missing, the type is assumed to be "text". This also determines whether you need to create a corresponding NVPair and what to put in its value:

TYPE=text
A simple text field is displayed. An NVPair must always be created for each such input tag, and the NVPair's value is whatever you'd enter in that field (which may be the empty string if nothing is entered).
TYPE=password
A simple password field is displayed - treat this exactly like a TYPE=text.
TYPE=checkbox
A checkbox is displayed. If the box is supposed to be checked, then create an NVPair with the value given in the VALUE attribute; otherwise skip this field. Note that you need to create an NVPair for each checkbox that is selected, even if some of them have the same NAME and/or VALUE attributes.
TYPE=radio
A radio button is displayed. There may be multiple of these with the same name, which creates a group (in which only one of the buttons can be selected at any time). Create an NVPair with value given in the VALUE attribute of that button which is checked; if no buttons are checked, don't create an NVPair for this tag.
TYPE=hidden
Nothing is displayed. Create an NVPair with the name and value as given in the NAME and VALUE attributes.
TYPE=submit
A submit button is displayed. If the tag contains a NAME attribute, then create an NVPair with the name and value as given in the NAME and VALUE attributes; if no NAME attribute is specified, skip this.
TYPE=reset
A reset button is displayed. Ignore this.

The SELECT and OPTION tags

These are used to provide a drop down selection box. An NVPair must be created with the name from the NAME attribute of the SELECT tag and the value of the VALUE attribute of the selected OPTION tag.

The TEXTAREA tag

This displays a multiline text entry area. An NVPair must created with the name from the NAME attribute of this tag, and the value must be whatever was entered in the textarea (which may be the empty string if nothing was entered).

A Complete Example

Following is an example form that uses all of the above described elements.

    <FORM ACTION="/cgi-bin/test" METHOD="POST">
    Text: <INPUT NAME="text1">
    <BR><INPUT type=checkbox NAME="ok" VALUE="yes">Ok?
    <BR><INPUT type=checkbox NAME="ok" VALUE="totally">Really Ok?
    <BR><INPUT type=radio NAME="when" VALUE="yesterday">yesterday
        <INPUT type=radio NAME="when" VALUE="today">today
        <INPUT type=radio NAME="when" VALUE="tomorrow">tomorrow
    <INPUT type=hidden NAME="secret" VALUE="data">
    <INPUT type=hidden NAME="secret" VALUE="more data">
    <INPUT type=hidden NAME="also-hidden" VALUE="data">
    <BR><SELECT NAME="aChoice">
	<OPTION VALUE="one">1
	<OPTION VALUE="two">2
	<OPTION VALUE="three">3
	</SELECT>
    <BR><TEXTAREA NAME="comments" ROWS=10 COLS=80>
	</TEXTAREA>
    <BR><INPUT type=submit VALUE="Doit">
    </FORM>

Assuming we enter the text "Hell hath no fury..." for the text, we check the second checkbox, choose today, select the third option, and give no comment, then the resulting code to duplicate this would look like the following:

    // set up the uri according to the action attribute
    URI form_uri = new URI(doc_uri, "/cgi-bin/test");
    HTTPConnection con = new HTTPConnection(form_uri);

    // create the NVPair's for the form data to be submitted
    NVPair[] form_data =
	new NVPair[] {
	    new NVPair("text1", "Hell hath no fury...");
	    new NVPair("ok", "totally");
	    new NVPair("when", "today");
	    new NVPair("secret", "data");
	    new NVPair("secret", "more data");
	    new NVPair("also-hidden", "data");
	    new NVPair("aChoice", "three");
	};
    
    // POST the form data, as indicated by the method attribute
    HTTPResponse rsp = con.Post(form_uri.getPathAndQuery(), form_data);

[HTTPClient]


Ronald Tschalär / 6. May 2001 / ronald@innovation.ch.