Dumped in Coldfusion

Monday, December 31, 2007

Week numbers in SimpleDateFormat

It seems that if you use week numbers in formats in java, then you can get stuffed, as there is no way to find out the year that corresponds to the week number that you're looking for.

Today is the 31st December 2007, which is week 1 of 2008. Java correctly says that the week is week 1, and that the year is 2007. Unfortunately this is a little ambiguous, and NOT USEFUL for writing logs.


public static void main(String[] args) throws ParseException {
DateFormat format = new SimpleDateFormat("yyyy-MM-dd", Locale.ENGLISH);

DateFormat output = new SimpleDateFormat("yyyy-ww", Locale.ENGLISH);

final Date firstDay = format.parse("2007-01-01");
final Date lastDay = format.parse("2007-12-31");

System.out.println("First: " + output.format(firstDay) + " - " + format.format(firstDay));
System.out.println("Last : " + output.format(lastDay) + " - " + format.format(lastDay));
}

Generates the output

First: 2007-01 - 2007-01-01
Last : 2007-01 - 2007-12-31


If you are using strftime in C etc., then you can use %G for the year, but if you're in java, then you're a bit stuffed.


%G The ISO 8601 year with century as a decimal number. The 4-digit
year corresponding to the ISO week number (see %V). This has
the same format and value as %y, except that if the ISO week
number belongs to the previous or next year, that year is used
instead. (TZ)


Good job I picked today to look at log file naming.

It seems that it isn't possible to make a tomcat valve rotate to a new log weekly. I guess we'll have to go with daily or monthly.

Update: Sun have bugs open for this, 4267450 opened in 1999 and 4808661 a request for 1.6, and many others.

Friday, November 09, 2007

Autoboxing making people lazy

When writing in Java5 primitive types get automatically converted to objects if they are needed. This means that you can write code like:

map.put(1,4);

where before you needed

map.put(Long.valueOf(1), Long.valueOf(4));

This is nice, and makes code prettier and easier to read.

Unfortunately it makes programmers lazy, and they don't pay attention to if they are using Long or long in methods, and may pass a Long to a method that takes a long. Like this:

public void suspicious(Long l) {
foo(l); // <-- Here
}

private void foo(long l) {
System.err.println("Long was: " + l);
}

One problem is that the JVM will be creating and deleting lots of boxing objects all over the place if your code is particularly inconsistent. However, there's a more subtle problem. This code can now NPE at the line marked Here, and it's can be difficult to track down.

Convert integers to strings in java

If you have an integer, and want a string, concatenating it with "" is not as good as String.valueOf. It generates loads more code.


int i = 5;

String s1 = "" + i; // BAD
String s2 = String.valueOf(i); // GOOD


I realise that it's a bit more typing, but it is clearer what you want, and it is more efficient.

This also works for all the other unboxed types. long, boolean etc.

Monday, April 23, 2007

Logging in and AJAX

This is an interesting problem. You have parts of a site that users can see without being logged in. Some of the actions on these pages are AJAX requests that do require the user to be logged in.

Does anyone know any cool ways of handling the login, or sites that do this? The normal "redirect to login page that redirects back" option doesn't seem to work for this sort of thing.

Wednesday, April 18, 2007

List lengths

If you have a list in JSP and want to know how long it is then you have to use a function from the JSTL. It isn't 'built-in'.
${fn:length(items)}

which means you need

<%@ taglib prefix='fn' uri='http://java.sun.com/jsp/jstl/functions' %>

elsewhere.

How annoying (and hard to find).

Friday, April 13, 2007

JSP Horror!

I no longer work with Coldfusion on a daily basis, but I have discovered why people use it. JSPs are even worse! It seems that the only way to replace newlines in a string is to assign a newline to a variable, and then use that.


<c:set var="nl" value="
" />
<input type="hidden" name="address" value='${fn:replace(user.address, nl, "&#10;")}'/>


This just seems so flaky.

Is there not a better way? (And I'm not sure that writing a custom tag is a better way).

Monday, October 02, 2006

Static awstats and persistent perl

Sorry, not a Coldfusion post this time. I've been safely away from it recently.

There have been several security problems with awstats that have lead to us not wanting to run it dynamically. There is also the load that it can create affecting the user-experience of people browsing the stats. Therefore we run awstats to generate the content statically.

As we have to run awstats 20 times for each directory on our sites, this takes a lot of time just starting up perl and parsing all the libraries. I have now installed persistent perl on the log generating machine and changed awstats.pl to use pperl rather than perl and it is about 6 times faster and doesn't appear to have broken it. Excelent.

Friday, September 22, 2006

Bad hash function causes problems with ColdFusion Structs

I was looking at the slightly complicated series of structs that are used in the cf_accelerate tag. According to discussions on the speed of structs the speed of ColdFusion structures gets really slow when you put more an more entries in them.

One of the commenters noted that by reversing the keys he could get a significant speedup. It seems that this is all to do with a bad choice of Hash function deep in ColdFusion.

A Hash table is a datastructure that is designed for fast lookups of keys. It uses a hash function to guess where something will be, then goes to check. This allows it to be faster than the average case in a lot of cases as you can find what you are looking for first time, rather than having to search through a list of entries, or use a complicated tree structure. Unfortunately if you look in the same place for two entries it still has to fall back on looking through them one by one, and so looses all of the speed advantages.

Choosing a hash function is a compromise. On one hand, the hash function gets called a lot, often in speed critical code (which is why I was surprised that java.net.URL uses DNS in its .equals method, but that is another topic), however the other aim of hash codes is to differentiate different inputs. At one extreme, a constant hash code fulfills all the requirements of the fast hash code, but it isn't very useful as your hash table becomes slow like a list.

Example of a key with the characters used in the hash code highlighted

The hash function choosen for the keys to structs in ColdFusion uses an evenly distributed selection of 4 of the letters from the key. These start with the first letter, then sample the rest of the key, as shown in the example to the left. The green-underlined letters are used to form the hash code.

Example of a longer key with the characters used in the hash code highlighted

The key to why all of the benchmarks (and many common use cases) show that it is slow is that this sampling breaks down when you have a long, constant prefix on all of your structure keys. In this case (as seen to the right), the algorithm will hash all of your keys to the same place.

Some people use really short keys in their Structs. These are nearly as bad as all keys less than 4 characters long are hashed based only on their first character, so ABB will hash to the same thing as AAA.

So there are two ways that the algorithm chosen breaks down. Unfortunately both of these are commonly used by people choosing keys for structs. Given that the special algorithm that has been implemented in ColdFusion probably took a programmer time to implement, it had better be a lot faster than Java's built-in hashCode method on String's in most cases given that it trivially fails to differentiate keys in common situations.

To show an this, lets take one of the examples quoted by a commenter.


<cfset s = structNew()>
<cfloop from="1" to="100000" index="idx">
<cfset s["xxxx#idx#"] = 0>
</cfloop>


He claims that this takes 6 minutes, compared to 1 second for a similar piece of code using a simple java.util.HashMap. When you look at the hash algorithm you start to see why. Although there are 100,000 items in the hash table, only 91 hash codes are used. So rather than the maybe 1 or 2 lookups that would be expected in a hash table to find where to put/find a key, there are up to 1100, averaging 550. That is, due to this badly designed hash function, lookups into the structs do on average 550 times as much work as they could.

If you reverse the keys then it is a lot faster as it uses 1010 hashcodes, and normally does 50 lookups for each entry. In contrast the standard hashcode used by Java for Strings gives every one of the 100,000 keys a different hashcode, allowing the keys to be found instantly.

In summary, be very careful what you use for the keys of your structs if you are going to create very large structures then use them in speed critical code. It is probably safest not to do this and to use java.util.HashMap objects instead if you want speed. For everyday use with small structs and very different keys this will not be a problem to you, but if you do start seeing slowdown in your struct accesses then consider this a possible reason.

Wednesday, July 19, 2006

Consistency in logs

Looking through the logs today I get a little confused by the dates in my coldfusion logs. At least I wasn't trying to grep to find what happened today, it appears a little confused about where in the world it lives. Maybe this is our fault, but it is a little confusing.


07/19 08:35:58 Information [main] - ColdFusion started
19/07 08:35:58 user ColdFusionStartUpServlet: ColdFusion MX: application services are now available
19/07 08:35:58 user CFSwfServlet: init
19/07 08:35:58 user CFCServlet: init
19/07 08:35:59 user FlashGateway: init
19/07 08:35:59 user CFFormGateway: init
19/07 08:35:59 user CFInternalServlet: init
Server coldfusion ready (startup time: 29 seconds)
07/19 09:40:49 Error [jrpp-23] - File not found: ...

Friday, March 17, 2006

Heap analysis


This is the memory use graph for one of our sites. Does it scream "memory leak" to you? What do I do next to work out what is taking up all of that memory, could it be some sort of harmless Coldfusion caching (I suspect not)?

I like graphs and nice analysis. I don't like sites that go off and sulk without telling me why. More and more I am realising that debugging a site that doesn't break until many people use it is hard unless you have instrumented your system beforehand. You can't just restart it every 5 minutes to put a bit more debugging in, as the people using it get pissed off (Coldfusion taking about 30 seconds to restart doesn't help this).