|
First a disclaimer beforehand: the posted code snippets are all basic examples. You'll need to handle trivial
IOExceptions and RuntimeExceptions like NullPointerException, ArrayIndexOutOfBoundsException and consorts yourself.PreparingWe first need to know at least the URL and the charset. The parameters are optional and depend on the functional requirements.
The query parameters must be in name=value format and be concatenated by &. You would normally also URL-encode the query parameters with the specified charset using URLEncoder#encode().The String#format() is just for convenience. I prefer it when I would need the String concatenation operator + more than twice.Firing a HTTP GET request with (optionally) query parametersIt's a trivial task. It's the default request method.
Any query string should be concatenated to the URL using ?. The Accept-Charset header may hint the server what encoding the parameters are in. If you don't send any query string, then you can leave the Accept-Charset header away. If you don't need to set any headers, then you can even use the URL#openStream() shortcut method.
Either way, if the other side is a HttpServlet, then its doGet() method will be called and the parameters will be available by HttpServletRequest#getParameter().For testing purposes, you can print the response body to stdout as below:
Firing a HTTP POST request with query parametersSetting theURLConnection#setDoOutput() to true implicitly sets the request method to POST. The standard HTTP POST as web forms do is of type application/x-www-form-urlencoded wherein the query string is written to the request body.
Note: whenever you'd like to submit a HTML form programmatically, don't forget to take the name=value pairs of any <input type="hidden"> elements into the query string and of course also the name=value pair of the <input type="submit">
element which you'd like to "press" programmatically (because that's
usually been used in the server side to distinguish if a button was
pressed and if so, which one).You can also cast the obtained URLConnection to HttpURLConnection and use its HttpURLConnection#setRequestMethod() instead. But if you're trying to use the connection for output you still need to set URLConnection#setDoOutput() to true.
Either way, if the other side is a HttpServlet, then its doPost() method will be called and the parameters will be available by HttpServletRequest#getParameter().Actually firing the HTTP requestYou can fire the HTTP request explicitly withURLConnection#connect(),
but the request will automatically be fired on demand when you want to
get any information about the HTTP response, such as the response body
using URLConnection#getInputStream() and so on. The above examples does exactly that, so the connect() call is in fact superfluous.Gathering HTTP response information
Maintaining the sessionThe server side session is usually backed by a cookie. Some web forms require that you're logged in and/or are tracked by a session. You can use theCookieHandler API to maintain cookies. You need to prepare a CookieManager with a CookiePolicy of ACCEPT_ALL before sending all HTTP requests.
Note that this is known to not always work properly in all
circumstances. If it fails for you, then best is to manually gather and
set the cookie headers. You basically need to grab all Set-Cookie headers from the response of the login or the first GET request and then pass this through the subsequent requests.
The split(";", 2)[0] is there to get rid of cookie attributes which are irrelevant for the server side like expires, path, etc. Alternatively, you could also use cookie.substring(0, cookie.indexOf(';')) instead of split().Streaming modeTheHttpURLConnection will by default buffer the entire request body before actually sending it, regardless of whether you've set a fixed content length yourself using connection.setRequestProperty("Content-Length", contentLength);. This may cause OutOfMemoryExceptions whenever you concurrently send large POST requests (e.g. uploading files). To avoid this, you would like to set the HttpURLConnection#setFixedLengthStreamingMode().
But if the content length is really not known beforehand, then you can make use of chunked streaming mode by setting the HttpURLConnection#setChunkedStreamingMode() accordingly. This will set the HTTP Transfer-Encoding header to chunked which will force the request body being sent in chunks. The below example will send the body in chunks of 1KB.
User-AgentIt can happen that a request returns an unexpected response, while it works fine with a real web browser. The server side is probably blocking requests based on theUser-Agent request header. The URLConnection will by default set it to Java/1.6.0_19 where the last part is obviously the JRE version. You can override this as follows:
Use the User-Agent string from a recent browser.Error handlingIf the HTTP response code is4nn (Client Error) or 5nn (Server Error), then you may want to read the HttpURLConnection#getErrorStream() to see if the server has sent any useful error information.
If the HTTP response code is -1, then something went wrong with connection and response handling. The HttpURLConnection implementation is in older JREs somewhat buggy with keeping connections alive. You may want to turn it off by setting the http.keepAlive system property to false. You can do this programmatically in the beginning of your application by:
Uploading filesYou'd normally usemultipart/form-data encoding for mixed POST content (binary and character data). The encoding is in more detail described in RFC2388.
If the other side is a HttpServlet, then its doPost() method will be called and the parts will be available by HttpServletRequest#getPart() (note, thus not getParameter() and so on!). The getPart()
method is however relatively new, it's introduced in Servlet 3.0
(Glassfish 3, Tomcat 7, etc). Prior to Servlet 3.0, your best choice is
using Apache Commons FileUpload to parse a multipart/form-data request. Also see this answer for examples of both the FileUpload and the Servelt 3.0 approaches.Dealing with untrusted or misconfigured HTTPS sitesSometimes you need to connect a HTTPS URL, perhaps because you're writing a web scraper. In that case, you may likely face ajavax.net.ssl.SSLException: Not trusted server certificate on some HTTPS sites who doesn't keep their SSL certificates up to date, or a java.security.cert.CertificateException: No subject alternative DNS name matching [hostname] found or javax.net.ssl.SSLProtocolException: handshake alert: unrecognized_name on some misconfigured HTTPS sites.The following one-time-run static initializer in your web scraper class should make HttpsURLConnection more lenient as to those HTTPS sites and thus not throw those exceptions anymore.
Last wordsThe Apache HttpComponents HttpClient is much more convenient in this all :)Parsing and extracting HTMLIf all you want is parsing and extracting data from HTML, then better use a HTML parser like Jsoup |
Sunday, November 12, 2017
Using java.net.URLConnection to fire and handle HTTP requests
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment