Search Engine Friendly URLs for Java Web Application

Static and Dynamic URLs

Any web-application has static and dynamic resources. As the names imply static resources are those that are never changed.

For example, html pages with no dynamic data, static download files etc.

Dynamic resources are those that can change their content from time to time: html pages that contain dynamic data (such as results of search queries), dynamic reports, download files that may change their content depending on the visitor’s preference and so on.

Usually we may easily distinguish static and dynamic URLs:

http://www.mysite.com/pictureOfMyDog.jsp - static URL.

http://www.mysite.com/picture.jsp?id=234&operation=show - dynamic URL.

URLs and Search Engines

Search engines like static URLs. When the search engine spider comes across a dynamic URL it will or will not (may and may not) follow it. Depending on the internal algorithms, the search engine will choose the most optimal way to go.

If your site is popular and the search engine knows about it, it can index your dynamic resources.

If later the search engine finds out an old dynamic URL does not work any more or leads to a different page it can stop indexing such URLs.

The more parameters your URL has the less are the chances the search engine will follow such URLs. The spider can follow them to check out what kind of content the pages contain and if there is any useful content at all.

URLs and Internet Surfers

What if someone likes the picture of your dog located at this URL:

http://www.mysite.com/picture.jsp?id=234&operation=show.

The poor guy bookmarks the page as he will never remember such an awful URL and thus will never show that picture to his girlfriend at her place. Later on when they come to the Poor Guy’s place and he wants to show that picture and he clicks the bookmark they miserably stare at a 404 page.

It happens because you decided to do a major rewrite of your code and now the picture of your beautiful dog is located here:

http://www.mysite.com/picture.jsp?photoId=234&operation=show

Moreover, if they do not find the picture, the night is spoiled! Only because the webmaster has never thought about static URLs.

URL Friendliness and Intranet Applications

As the value of static URLs is clear for the web application exposed to public in the Internet, some may think they are useless for the Intranet applications. The search engines do not index those applications. The URLs usually do not change. Well, I though that too. However, if you think about it a little more you will find out that it can be of great use for your application as well.

First of all this is just a plain aesthetic pleasure. If you do not care about nice URLs, think about the developers who can save time and probably someone’s money by typing shorter URLs and by making fewer mistakes in those URLs.

On my day job, it happens that I have to browse the application we are developing on the computers of other developers through the network. In addition, every time I have to type the URL of the login page with the developer’s machine network address. Moreover, every time this is a lot of pain. This URL is long and complicated. And this is the URL that everyone has to type pretty often, as there is no way to save all of them in an ugly IE bookmarking facility. This is the first example I can think of but I am sure that if you think about that a little bit you will find many reasons to have static URLs in your applications.

Java and Search Engine Friendly URLs

The users of Apache HTTP server are happy to have to have such functionality almost out-of-box. There is no standard solution for J2EE platform. Fortunately, there is a project that let you have the desired functionality. This is called Url Rewrite Filter.

“Based on the popular and very useful mod_rewrite for apache, UrlRewriteFilter is a Java Web Filter for any J2EE compliant web application server (such as Resin, Orion or Tomcat), which allows you to rewrite URLs before they get to your code. It is a very powerful tool just like Apache's mod_rewrite.
URL rewriting is very common with Apache Web Server (see mod_rewrite's rewriting guide) but has not been possible in most java web application servers”.

Visit their web site and download the filter code, documentation and manuals.

The documentation of this filter is great and the usage is pretty simple, however I will show you an example:

I have a long ugly URL:

http://www.mysite.com/SoftwareList.do?operation=showList&chapterId=X

I want it to look nice and to be search engine friendly:

http://www.mysite.com/category-programs/audio-and-video-/X

We compose a rule:

XML:

  1. <rule>
  2.         <from>^/category-programs/(.*)/([0-9]+).*$</from>
  3.         <to>/SoftwareList.do?operation=showList&amp;chapterId=$2</to>
  4.     </rule>

And place that rule to urlrewrite.xml.

Remember, you have to encode all relative URLs, otherwise the paths to css, js, images and all other paths will be corrupted. So use jstl’s tag for all your relative paths.

Alternatively, you may add ‘redirect’ attribute to the rule:

XML:

  1. <rule>
  2.         <from>^/category-programs/(.*)/([0-9]+).*$</from>
  3.             <type="redirect">/SoftwareList.do?operation=showList&amp;chapterId=$2</to>
  4.     </rule>

But then the user will see this ‘ugly’ URL in the address line of the browser.

Conclusion

Url Rewrite Filter is a great way for J2EE developers to add static url functionality to their applications. It is simple and easy, the configuration file is automatically reloaded occasionally (you define the interval). That is it for now. I have written an URLAbstractor class and a custom tag to make clean URLs out of any String.

URL Beautifier

I have written a small class to convert a string to a pretty URL :

JAVA:

  1. package com.leadercode.tag.url;
  2. import java.io.UnsupportedEncodingException;
  3.  
  4. /**
  5. • URLAbstractor
  6. • @author Sergey Nechaev
  7. *
  8. */
  9. public class URLAbstractor {
  10. public static String encode(String url) throws UnsupportedEncodingException {
  11. StringBuffer out = new StringBuffer(url.length());
  12. for (int i = 0; i <url.length(); i++) {
  13. int c = (int) url.charAt(i);
  14. switch © {
  15. case ‘ ‘:
  16. case ‘&’:
  17. case ‘,’:
  18. case ‘.’:
  19. case ‘:’:
  20. c = ‘-‘;
  21. break;
  22.             }
  23.  
  24. if (c == ‘-‘ && i> 0 && out.charAt(out.length() - 1) == ‘-‘) {
  25. continue;
  26.             }
  27.  
  28. out.append((char) c);
  29.         }
  30.  
  31. return out.toString();
  32.     }
  33. }
  34.  
  35. SEF Tag
  36. package com.leadercode.tag.url;
  37. import javax.servlet.jsp.JspException; import javax.servlet.jsp.tagext.BodyTagSupport;
  38.  
  39. /**
  40. • The URL abstractor tag
  41. • @author Sergey Nechaev
  42. *
  43. */
  44. public class SefLink extends BodyTagSupport {
  45. private String url;
  46. public String getUrl() {
  47. return url;
  48.     }
  49.  
  50. public void setUrl(String url) {
  51. this.url = url;
  52.     }
  53.  
  54. public int doStartTag() throws JspException {
  55. try {
  56. pageContext.getOut().print(URLAbstractor.encode(url));
  57. } catch (Exception e) {
  58.  
  59.         }
  60.  
  61. return SKIP_BODY;
  62.     }
  63. }

Leave a Reply