How much wood would a woodchuck chuck?
Imagine, if you will, a core application’s code base that stretches back ten years, has around 3,500 classes, and 15,000 compiler warnings.
In its germination, the Java industry had confused ideas about interoperability, no standard collections, no O/R mapping frameworks, application servers (lightweight or otherwise), and no sensible OO patterns of enterprise architecture to be found lazing around in better bookstores.
Logging was a matter of System.out.println(). A couple of years later, someone decided to add a static logging class who Log.println() at least allowed directing the logs to named files, but no log levels were envisioned. As the code base grew, production support and new developers relied on log contents to help profile execution. This became so useful and popular, that at first service entry points and then nigh on every method logged its entry and exit, along with timings. Any logging levels that had been differentiated were made redundant because log entries were so voluminous that it was probably a Bad Idea to remove any of them lest the application logic no longer be trackable. Years later still, a few minor sporadic attempts to integrate Log4J and then Commons Logging were made. The homespun Log class was marked deprecated to discourage developers from using it, and hope they would instead move to Commons Logging.
Good intentions, all, but now facing a situation that the logging generates some 2GB every hour in a multi-day batch run, generating the method profiling logs account for a tidy fraction of the execution time, and code which struggles under complicated business logic implementation is made even more difficult to comprehend with cross-cutting complication like the multiple logging mechanisms sprinkled throughout.
Recently, a small discussion has started up looking at ways to deal with this. Options include: Remove the deprecated annotation on the in-house Log class - at least that reduces compiler warnings; Ask each developer to manually the logging for a class whenever it is modified - it might take a year but at least it gets done, but then the developers are too busy nutting out how to add the latest business functionality, the logging issue is a back-of-the-mind thing; Have the in-house Log class delegate to Commons Logging - but there was effectively only One Log before and knowing which method was logging was handled by visual inspection of the method profiling logs; Global, clever search-and-replace scripts to brute-force the code change.
Which option would you choose? Heavylifting is easier with a long plank than raw biceps. All of those options have merit in different circumstances, and in this one all of them can be used in degrees to get the best outcome for the developers, IT, and the business.
So, start with making the in-house Log class behaving more like Commons Logging. First of all, we’ll rename it from “Log” to “LegacyLog” to help with moving to org.apache.commons.logging.Log. It currently has a pattern of usage like this:
package legacy.code;
import legacy.common.util.logging.LegacyLog;
public class Order {
public Order() {
LegacyLog.initialise("Order");
LegacyLog.println("Order class constructed");
}
public void place() {
Profile.entry("Order.place");
doOrderPlacement();
Profile.exit("Order.place");
return;
}
}
The Profile class happens to use the Log class to log the entry to the method and the exit, with the exit including the number of milliseconds the method was running.
So, the string sent to Log.initialise is the unqualified class name. We want to transform it into something like this:
Log log = LogFactory.getLog(Order.class);
We could do a global search-and-replace to take the string, strip the double-quote marks, make it capitalised, and append a “.class” to it - or even just change them all to LegacyLog.initialise(this.getClass()); if we don’t entirely trust the content of the string anyway.
As LegacyLog is used in a static context, it’s rather hard to track that information when methods call other methods that initialise the stack. Let’s do something like this to store a Log for each class in LegacyLog’s static state:
private static Map<Integer,Class> logsForClasses = new HashMap<Integer,Class>(1000);
int hash = hash(classToLog);
if (!logsForClasses.containsKey(hash)) {
Log log = LogFactory.getLog(classToLog);
logsForClasses.put(hash, log);
}
private static int hash(Class clazz)
{
return clazz.hashCode();
}
How to get the classToLog? We only get unqualified class names in the argument to LegacyLog.initialise and we don’t entirely trust it. We could walk the stack, or use the sun.reflect.Reflection class to find our caller:
static Class getCallerClass(int skip) {
// I trust Reflection is more efficient than creating a Throwable and walking the stack trace...
Class caller = null;
// break out of this class first (always at least 2 deep in stack)
int i = 2;
caller = Reflection.getCallerClass(i);
while (hash(LegacyLog.class) == hash(caller)) {
i++;
caller = Reflection.getCallerClass(i);
}
if (skip > 0) {
caller = Reflection.getCallerClass(i + skip);
}
return caller;
}
static Class getCallerClass() {
return getCallerClass(0);
}
My comment about efficiency is that I hope Reflection.getCallerClass is using native internal JVM methods to pluck out the caller rather than me doing it myself with the expensive new Throwable().getStackTrace() technique. GNU’s version uses something called VMWalker, and based on my performance tests, I think it is (but you never know with Sun).
With that, we can now log out a message using the logger, selected by Log4J, of the calling class:
/**
* Legacy method
* Equivalent of getLog().info(message)
*/
public static void print(String message) {
getLog().info(message);
}
/**
* Migrating method.
* If you’re using this directly, you might as well go the whole way.
*/
public static Log getLog() {
return logsForClasses.get(hash(getCallerClass()));
}
The INFO log level has been used for legacy messages. This is to allow a reasonable default of WARN to be used for production runs and balance that against to much (ill-considered) log data. INFO is the lowest level that should be used in production, and this implementation suggests that it should only be switched on when necessary. DEBUG and TRACE are for development use and should not be on in production. This allows development-level log messages to be included in the code but not come close to being used in production runs.
log4j.rootLogger=WARN, A1 log4j.appender.A1=org.apache.log4j.ConsoleAppender log4j.appender.A1.layout=org.apache.log4j.PatternLayout # Print the date in ISO 8601 format #log4j.appender.A1.layout.ConversionPattern=%d [%t] %-5p %c [%F:%L] - %m%n log4j.appender.A1.layout.ConversionPattern=%d [%t] %-5p %c - %m%n # Print only messages of level DEBUG or above in the package legacy.common.util.logging log4j.logger.legacy.common.util.logging=DEBUG
As a not-insignificant aside, the layout patterns %F and %L don’t work - it reports the LegacyLog location, not the caller. It’s quite expensive anyway (creating a Throwable every log, and walking the stack adding about 200% to the logging execution time) and should be reserved for TRACE log levels (DEBUG if really necessary). Short of altering the LocationInfo class in Log4J (or chosen logger) to pop up the stack one more level (to skip LegacyLog), if the developer wants location information, they’ll have to not use the LegacyLog class and the journey will be complete.
In the full code, logged exceptions are show as ERROR, because although the old messy “log-and-rethrow” pattern is used, it is using code written back when a cause wasn’t passed on a rethrow, so the logged exception was the easy way to find out what was going on. Changing that behaviour is another exercise.
With this, the deprecation can be removed. Gradual improvement, as required, is fully supported and explicit changes to make legacy log messages more visible (and hopefully meaningful) are encouraged. Even global search-and-replace is made easier. The old LegacyLog.initialise can be replaced as described earlier, and with a few tweaks, the print method and its brethren can be inlined.
However much more “smelly” that is, it is a further avenue of pursuit. Looking at how Chainsaw can help clean up the originating logging source as well as be used as a harness for automated tests of logging systems is also of further interest. The omitted, but very long (I’ve got logs created for around 2500 classes) performance test may show that something like the GNU Trove collections out-perform the Java ones (http://trove4j.sourceforge.net/overview.shtml), but I’m yet to give them a whirl.
Anyway, the source will be here for a first start.




