JodaTime goodies – flexible parsing

Parsing dates in Java with JodaTime is easy.

1. instantiate a date time formatter

DateTimeFormatter dateTimeFormatter =
DateTimeFormat.forPattern("yyyyMMddHH:mm:ss.SSS") ;

 

2. …and parse the date time

dateTimeFormatter.parseDateTime("20120515 09:30:25.123") ;

 

3. but when  the date time format changes (say to include a timezone)…alarm bells.

dateTimeFormatter.parseDateTime("20120515 09:30:25.123 +0000") ;
//results in//
java.lang.IllegalArgumentException:
Invalid format: "20120515 09:30:25.123 +0000" is malformed at " +0000"

 

4. Fortunately the JodaTime api includes a DateTimeFormatterBuilder which can be used to build a dateTimeFormatter customised with multiple printers and parsers.

DateTimeFormatter dateTimeFormatter=
DateTimeFormat.forPattern("yyyyMMddHH:mm:ss.SSS") ;

DateTimeFormatter dateTimeFormatterWithTimeZone=
DateTimeFormat.forPattern("yyyyMMdd HH:mm:ss.SSS Z");

DateTimeFormatter optionalTimeZoneFormatter=
new DateTimeFormatterBuilder()
 .append(null, //because no printing is required
 new DateTimeParser[]{dateTimeFormatter.getParser(),
dateTimeFormatterWithTimeZone.getParser()}).toFormatter();

 

5. now the same DateTimeFormatter handles different date time formats

optionalTimeZoneFormatter.parseDateTime("20120515 09:30:25.123") ;
optionalTimeZoneFormatter.parseDateTime("20120515 09:30:25.123 +0000") ;

JodaTime goodies – testing time dependent logic

This post outlined a possible way to test time-dependent logic by abstracting the concept of current time behind a clock interface and providing a mock clock implementation for the tests. This is important because otherwise relying on the system clock to test, say, if a particular action is triggered every day at 12.00 might be tricky.

JodaTime makes it easier to implement the above. It already provides a MillisProvider interface (to abstract the concept of current time) which can be accessed through methods from the DateTimeUtils class, thus saving you from writing your own.


//fix the current time to 1000 millis
DateTimeUtils.setCurrentMillisFixed(1000);

... run time-dependent logic

//time is still at 1000 millis
long time = DateTimeUtils.getCurrentMillis();

Scoping JBehave tests with Spring

Rule number 1 while running Jbehave tests (or any integration test really) is to execute the test scenarios independently from each other. Meaning the application state must be reset before each and every scenario run.

The code below presents a generic way to accomplish this reset while using JBehave and Spring. This approach is articulated around 2 components:

1) ScenarioContext

A class responsible for creating and cleaning up the data used by the test scenarios. This usually encompasses all kind of static and reference data plus any messages being enqueued/dequeued at the boundaries of the application (in this particular example the messages being enqueued are trade messages).

public class ScenarioContext {

    private List trades=new ArrayList();

    public void addTrade(Trade trade){
        trades.add(trade);
    }

    public ScenarioContext(){
        trades = new ArrayList();
    }
    @PostConstruct
    public void resetTrades(){
        trades.clear();
    }

    public Trade getFirstTrade(){
        return trades.get(0);
    }
}

2) ScenarioScope

This class is responsible for cleaning up the scenario context whenever a new test scenario is being run. It is defined as a custom scope Spring bean which creates a new instance of the object(s) under scope whenever JBehave is about to run a new scenario (the @BeforeScenario below is a JBehave annotation which allows the annotated method to run before a scenario).

public class ScenarioScope implements Scope {

    private final ConcurrentMap<String, Object> cache = new ConcurrentHashMap();

    @BeforeScenario
    public void startScenario(){
        cache.clear();
    }

    @Override
    public Object get(String name, ObjectFactory objectFactory) throws IllegalStateException{
        if (!cache.containsKey(name)){
           cache.putIfAbsent(name, objectFactory.getObject());
        }
        return cache.get(name);

    }
    ....

The ScenarioContext lifecycle is tied to the ScenarioScope in the Spring config file:


 <bean id="scenarioScope" class="foo.bar.ScenarioScope"/>

    <bean class="org.springframework.beans.factory.config.CustomScopeConfigurer">
        <property name="scopes">
            <map>
                <entry key="scenario" value-ref="scenarioScope"/>
            </map>
        </property>
    </bean>

    <bean id="scenarioContext" class="foo.bar.ScenarioContext" scope="scenario">
        <aop:scoped-proxy/>
    </bean>

Full code can be found here.

Implementing the builder pattern with Jackson

The builder pattern allows for the construction of an object step by step (properties by properties). This comes handy while writing tests as this is when we want to instantiate the object being tested precisely in the state deemed useful for the test.

When the object under test is a “Thing”, with 2 properties name and description:

class Thing
{
    private String name;

    private String description;

    public String toString(){
        return String.format("name:%s description:%s", name, description);
    }

   public Thing(){}

   public Thing(String name, String description){
        this.name = name;
        this.description = description;
    }

    String getName() {
        return name;
    }

    void setName(String name) {
        this.name = name;
    }

    String getDescription() {
        return description;
    }

    void setDescription(String description) {
        this.description = description;
    }
}

Then the first approach to constructing a builder for the Thing is to expose a list of “with-er” methods (chained setters under another name) eg. “withName(…)”, “withDescription(…)” and so on, where each method maps to a property of the Thing.

public class Builder {

    private String name;
    private String description;

    public Thing build(){
        return new Thing(name, description);
    }

    public Builder withName(String name){
        this.name = name;
        return this;
    }

    public Builder withDescription(String description){
        this.description = description;
        return this;
    }
}

and the associated test:

public class BuilderTest{

    @Test
    public void testBuilder(){
        Thing aThing = new Builder()
            .withName("aname")
            .withDescription("somedesc")
            .build();

        assertThat(aThing.getDescription(),is("desc"));
        assertThat(aThing.getName(), is("aname"));
    }
}

Second approach: use the Jackson processor to directly instantiate a Thing from a json string.

public class JsonBuilder {

    private Thing thing =null;

    public Thing build(){
        return thing;
    }

    public JsonBuilder add(String s) throws  IOException {
        this.thing = new ObjectMapper().readValue(convertToJson(s), Thing.class);
        return this;
    }

    private String convertToJson(String nvps){
        String json = nvps.replaceAll("([A-za-z0-9.]+)","\"$1\"");
        json = "{" + json + "}";
        return json;
    }
}

and the test:


public class JsonBuilderTest {

    private final JsonBuilder builder = new JsonBuilder();
   
    @Test
    public void testJsonBuilder() throws Exception{
        Thing aThing = builder.add("name:aname,description:desc").build();
        
        assertThat(aThing.getDescription(),is("somedesc"));
        assertThat(aThing.getName(),is("aname"));
    }
}

See how with Jackson the code required to instantiate the object under test is less verbose and more compact than the alternative approach. The downside is a slightly more fragile code (more prone to typos) since the json string cannot be checked at compile time.

Full source code for the samples above is here.

A Junit rule to turn test logging on/off

Testing methods which log exceptions can result in a messy build log, peppered with stack traces and error messages, without any obvious way to discern whether these errors are intentionally triggered by the tests.

Example:

The above test will succeed but also produce the following output in the logs:

SEVERE: error
java.lang.IllegalArgumentException: boom
at com.company.Item.function(Item.java:12)
at com.company.ItemTest.testFunction(ItemTest.java:12)

The above output is in this case undesirable and can be hidden by using a junit rule which will run before the test to set the logging level to OFF, and then back again to it’s original level once the test is finished.

Output becomes:
Process finished with exit code 0

Synchronization vs atomicReference performance test

Are “Lock free” structures necessarily more performant (quicker) than the traditional approach to synchronization, relying on locks ? How does the number of threads impact performance ?It’s time for a little speed test.

speed

An AccountDate class (a simple wrapper around a Date)  is updated 1,000,000 times in two scenarios:

– scenario 1: 20 threads updating an accountDate instance 50,000 times.
– scenario 2: 2 threads updating an accountDate instance 500,000 times.

Within each scenario, three different objects are used to execute the update.

– The SimpleAccountDate updater acts a baseline. It does not use synchronization at all, which will lead to a race condition (most probably)

– AccountDateSynchronized uses the synchronized keyword to serialize access by the threads to the content of the AccountDate.

– AccountDateAtomicRef uses a lock free AtomicReference and CAS to update the underlying AccountDate.

results with 20 threads

SimpleAccountDate in 14 ms race condition
AccountDateSynchronized in 68 ms OK
AccountDateAtomicRef in 119 ms OK

results with 2 threads

SimpleAccountDate in 13 ms race condition
AccountDateSynchronized in 63 ms OK
AccountDateAtomicRef in 60 ms OK

Unsurprisingly the SimpleAccountDate is very fast (but wrong). Performance under synchronization degrades only very slightly with the number of threads, while the CAS strategy does not scale nearly as well. This can be explained by the higher number of threads causing higher contention on a single accountDate object, with many threads simultaneously attempting to update this object, but only one thread at at time managing to do so.

The code source

A linux command to group log messages by time

Given a log file of this form:

[11:29:22.271 INFO  pool-4-thread-1] Received: { Bladibla
[11:29:22.271 INFO  pool-4-thread-1] Received: { Bladibla
[11:29:22.271 INFO  pool-4-thread-1] Received: { Bladibla

To find out how many messages are received per minute: find the log lines containing the word ‘Received’, extract the hour+minute on each of these log lines, discard all duplicates and count the number of occurences.

Command:
grep ‘Received:’ mylogfile.log | cut -c2-18 | awk ‘{print substr($0,0,length()-12)}’ | uniq -c

Result will be something like:

100 08:31
93 08:32
91 08:33
73 08:34

i.e 100 messages logged at 08:31, 93 at 8:32…etc.

This is the beauty of composability: a few simple, well-defined functions, free of side-effects , which can be piped into one another to (easily) achieve (fairly elaborate) results.

No such thing as average latency

Latency is  the time taken for a message to travel from one system to another. Consequently the average latency is the sum of all latencies over the total number of messages processed (i.e the inverse of the throughput, which is total number of messages processed over total time taken to process these messages).

…Right ?

wrongcat

Wrong, in most cases. The above reasoning does not take into account the distribution of the latencies. The arithmetic mean / average when applied to a skewed distribution can be meaningless at best, and misleading at worst.

EXAMPLE

Two competing systems process 200 messages each in 1000 ms

It takes 5 ms for System A to process each of the 200 messages.
Throughput = 200/1000 = 0.2msg / msec
Latency = 1/Throughput = 5 msec

It takes 1 ms for System B to process each of 199 messages, and a further 801 ms to process the 200th message.
Throughput -= 200/1000 = 0.2 msg/ msec
Latency = 1/Throughput = 5 msec

…which system is “better” depends on the expectations of the client but clearly the average latency and throughput are identical even though these systems exhibit significantly different performance characteristics.

HOW TO FIX

Method 1)

The simplest and quickest method to get a more accurate representation of the “typical” latency is to take the median, which is better suited than the arithmetic mean for skewed distributions

median for system A = 5ms,
median for system B = 1ms

Method 2)

Use an histogramYou can build your own or re-use an existing one. The code below uses the Histogram class which is part of the Disruptor package to print out the upper bound within which 99% of observations fall. 

final long intervals = new long [] {1,2,5,10, 50, 100, 1000};
Histogram h = new Histogram(intervals);
for (int i=0; i&lt;200;i++){
   h.addObservations(5);
}
System.out.println("System A" + h.getUpperBoundForFactor(0.99d)+ " ms");

prints “System A:5 ms”

final long intervals = new long [] {1,2,5,10, 50, 100, 1000};
Histogram h = new Histogram(intervals);
for (int i=0;i<199;i++){
   h.addObervations(1);
}
h.addObservations(801);
System.out.println("System B:"+h.getUpperBoundForFactor(0.99d) + " ms");

prints “System B:1 ms”

Update:
——–
Method 3) even better than methods 1 and 2 above – use the Codahale metrics library to get access to meters, histograms, timers (and more) without re-inventing the wheel.

Scala breakout game code

A (very small) breakout game written in Scala/Swing.

breakout

Nail biting stuff…

Demonstrates a few of the features of Scala such as for yield comprehensions, “static method” (or rather equivalent thereof), flatMap, reactions, traits..etc…Possibly not the game of the year, but Scala does make it fun to write even the simplest piece of code.

Customized JBehave reports

By default JBehave lays out HTML report data tables horizontally, with headers on the first row and values on the second row below. In some cases though it may be more visually appealing to organise the data vertically instead, with headers and rows in two separate columns.

Not to worry – JBehave can be heavily customized (heavily, but not easily… documentation is scarce). HTML Tables in particular are rendered by a freemarker macro “renderTable” located in ftl/jbehave-html-output.ftl. Simply edit this file and modify the below macro to swap headers and columns.

Original macro:

<#macro renderTable table>
<#assign rows=table.getRows()>
<#assign headers=table.getHeaders()>
<table>
<thead><tr>
<#list headers as header>
<th>${header?html}</th>
</#list>
</tr></thead>
<tbody>
<#list rows as row>
<tr>
<#list headers as header>
<#assign cell=row.get(header)>
<td>${cell?html}</td>
</#list>
</tr>
</#list>
</tbody>
</table>
</#macro>

becomes:


<#macro renderTable table>
<#assign rows=table.getRows()>
<#assign headers=table.getHeaders()>
<table>
<tbody>
<#list headers as header>
<tr>
<#assign cell=rows[0].get(header)>
<td>${header?html}</td>
<td>${cell?html}</td>
</tr>
</#list>
</tbody>
</table>
</#macro>

Copy the edited ftl file in your project resources as custom-html-output.ftl

Next instruct JBehave to look for this custom freemarker file at generation time.


import org.jbehave.core.configuration.Keywords;
import org.jbehave.core.reporters.*;

import java.io.File;

public class CustomHtmlOutput extends HtmlTemplateOutput {

        public CustomHtmlOutput (File file, Keywords keywords){
            super(file, keywords, new FreemarkerProcessor(CustomHtmlOutput.class),"custom-html-output.ftl");
        }

        public static final Format FORMAT = new Format("HTML"){
            @Override
            public StoryReporter createStoryReporter(FilePrintStreamFactory factory, StoryReporterBuilder storyReporterBuilder){
                factory.useConfiguration(storyReporterBuilder.fileConfiguration("html"));
                return new CustomHtmlOutput(factory.getOutputFile(),storyReporterBuilder.keywords())  ;
            }
        };
}


and modify the embedder configuration to use the above class.


            Configuration configuration = injectedEmbedder().configuration();
            configuration.useStoryReporterBuilder(configuration.storyReporterBuilder().withFormats(CustomHtmlOutput.FORMAT));
            injectedEmbedder().runStoriesAsPaths(storyPaths()) ;
        }

An introduction to JBehave

JBehave is a BDD (for Behaviour Driven Development) framework, and a such it is used to map between stories expressed in natural language, and the underlying java code used to test these stories.

A concrete example of how this works.

1) First the story is written, usually following a given-when-then template.

Scenario: a trade with a positive price passes validation
Given a trade with a positive price
When the trade is validated
Then trade passes validation



2) Then a class is created which maps the text above to the the yet-to-be-written production code (since we follow a TDD approach the tests are written first, the production code later).


@Component
class TradeStep {

@Resource
private TradeService tradeService;

private Trade trade;

private boolean validationResult;

@Given("a trade with a positive price")
public void aTradeWithAPositivePrice() {
   trade = new Trade(1);
}

@When("the trade is validated")
public void theTradeIsValidated() {
   validationResult = tradeService.validate(trade);
}

@Then("trade passes validation")
public void tradePassesValidation() {
   assertTrue(validationResult);
}


3) Finally the code required to make this test pass is actually written. In the example above this would be when the domain object Trade and its associated service TradeService are implemented. Note that in this scenario the TradeService is a dependency injected by Spring.

public class Trade {

   int price;

   Trade (int price){
      this.price = price;
   }
}


import org.springframework.stereotype.Component;

@Component
public class TradeService {
   public boolean validate(Trade trade) {
      return (trade.price > 0);
   }
}


4) Last but not least – JBehave requires an entry point into the tests, a.k.a an Embedder.  This is a piece of code which indicates to JBehave where to look for the stories files, how to handle failures, which reports to output …etc… Each of these behaviours can be easily customized.

There are several embedders to choose from but in this instance we use a “SpringAnnotatedEmbeddedRunner” because it provides Spring based dependency injection.

   @RunWith(SpringAnnotatedEmbedderRunner.class)
    @UsingEmbedder(
            generateViewAfterStories = true,
            ignoreFailureInStories = false,
            ignoreFailureInView = false)

    @UsingSpring(resources={"spring-config.xml"})

    @Configure(
            storyReporterBuilder= JBehaveRunner.CustomReportBuilder.class ,
            pendingStepStrategy = FailingUponPendingStep.class
    )
   @UsingSteps(instances={TradeStep.class})
   public class JBehaveRunner extends InjectableEmbedder {

        protected java.util.List<String> storyPaths (){
            return new StoryFinder().findPaths(codeLocationFromClass(this.getClass()), "*.story", "")   ;
        }


        @Test
        public void run() throws Throwable{
            injectedEmbedder().runStoriesAsPaths(storyPaths()) ;
        }

        public static class CustomReportBuilder extends StoryReporterBuilder {
            public CustomReportBuilder (){
                CrossReference crossReference = new CrossReference().withJsonOnly().withOutputAfterEachStory(true);

                this.
                        withDefaultFormats()
                        .withFailureTrace(true)
                        .withFormats(HTML, CONSOLE)
                        .withCrossReference(crossReference)
                        .withCodeLocation(codeLocationFromClass(this.getClass())) ;
            }
        }
    }

When JBehave runs a test report will be generated for each story. It will look like so if all goes well (all green !):

jbehave_pass

if something goes wrong instead the result will be:

jbehave_fails

Put together all of the JBehave reports will form a live documentation of the system. Any member of the team can check in realtime what is the expected behaviour of the system, without having to dig into the code. If JBehave is hooked into the continuous integration build, which is highly-recommended, these reports will never go out of date.

Calculating the big O of a priority queue insert

A PriorityQueue is a tree-like structure where the nodes are ordered according to a Comparator function, eg. the lower the value of the node the  higher up it will be in the tree, with the root node being the node with the minimum value.

priority queue

New elements are originally inserted at the bottom of the tree then “sift up” to their appropriate position in the tree. Each sift up operation compares the value of the node to the value of its parent, and if the value of the node is smaller then both nodes are swapped.

Cost of the insert operation is then function to the number of sift up operations to execute…which will be function of the height of the tree.

priority queue height


How high is the tree then ?  I
n a balanced binary tree each parent will have  two children nodes, so:

at level 1, total number of nodes N= 1(root node)
at level 2, N= the root node’s two children + the root node = 2*1 + 1 = 3 nodes in total
at level 3, N= 2*2 + 2*1 +1 = 7
at level 4, N= 2*2*2 + 2*2 + 2*1 + 1 = 15
….
at level h, N= sum of (2^i) where i runs from 0 to h

The above is a geometric serie with first term 2,  ratio 2 and h terms , which can be calculated as: N = 2^(h+1)-1

Finally from the inverse relationship between logs and exponentials:
log_2(N) = h+1

… and the big O of an insert in a priority queue is log(N)

SparseArray vs HashMap

A sparse array in Java is a data structure which maps keys to values. Same idea as a Map, but different implementation:

  • A Map is represented internally as an array of lists, where each element in these lists is a key,value pair. Both the key and value are object instances.
  • A sparse array is simply made of two arrays: an arrays of (primitives) keys and an array of (objects) values. There can be gaps in these arrays indices, hence the term “sparse” array. Example source code here.

The main interest of the SparseArray is that it saves memory by using primitives instead of objects as the key.  For instance the screenshot below (courtesy of visualVM), shows the memory used when storing 1,000,000 elements in an sparse array of <int, String> vs an HashMap of <Integer,String> :

Screen Shot 2013-05-01 at 21.16.50

The difference in size between both structures can be explained by :

–  difference in size of the key: an int is 4 bytes while an Integer  is typically 16 bytes (JVM-dependent).

– Overhead of a Hashmap entry compared to an array element. ie. a HashMap.Entry instance must keep track of the references for the key,  the value and the next entry. Plus it also needs to store the hash of the entry as an int.

Performance gains from bitwise operations

An experiment to assess the performance gains from bitwise operations in Java.

Methodology: use the junit benchmark framework to measure the time elapsed with and without bit-twiddling for four operations (each executed several million times): multiply, divide , modulo and finding the next power of two.

Configuration: JDK 8 running on MacOS 10.7, intel i5 processor.

Results

For each test two results are shown: time taken without bit-twiddling first, with bit-twiddling second.

1) Division

DivideBenchmark.testJVMDivision: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.04 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.39, time.warmup: 0.07, time.bench: 0.32
DivideBenchmark.testBitwiseDivision: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.03 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.33, time.warmup: 0.04, time.bench: 0.29

2) Multiplication

MultBenchmark.testJVMMultiply: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.02 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.22, time.warmup: 0.04, time.bench: 0.18
MultBenchmark.testBitwiseMultiply: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.02 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.18, time.warmup: 0.03, time.bench: 0.16

3) Modulo

ModuloBenchmark.testJVMModulo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.09 [+- 0.03], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 1.00, time.warmup: 0.20, time.bench: 0.80
ModuloBenchmark.testBitwiseModulo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.09 [+- 0.01], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.95, time.warmup: 0.10, time.bench: 0.84

4) Find the next power of 2

NextPowerOfTwoBenchmark.testJVMNextPowerOfTwo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 1.44 [+- 0.02], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 14.49, time.warmup: 1.50, time.bench: 13.00
NextPowerOfTwoBenchmark.testBitwisePowerOfTwo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.03 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.33, time.warmup: 0.05, time.bench: 0.28

Conclusion

There’s no discernable performance gains when using bit-twiddling to execute multiplications, divisions or modulo. This is hardly surprising as these simple utilities should already be heavily optimised by the JVM.

Less heavily JVM-optimised code  (eg. find the next power of two) shows large potential performance wins, which may, in some cases, justify the less readable code associated with bit shifts techniques.

Benchmark source code

Back to basics : anatomy of an ArrayList

The ArrayList is ubiquitous in Java programming and is often picked as the default List implementation. Is this always a good choice ? Let’s take a peek at the source code for some of the key methods of ArrayList.java and find out.

Data structure

As the name implies an ArrayList is backed by an array, no surprises here.

Object[] elementData


To retrieve an element:

public E get(int index) {
   rangeCheck(index);
   return (E) elementData[index];
}

Pretty straightforward – check that the index parameter is valid and return the element located at this index. Because all elements are stored as Objects a cast from Object to the type parameter of the ArrayList is required beforehand.

To add an element:

public boolean add(E e) {
   ensureCapacityInternal(size + 1);
   elementData[size++] = e;
   return true;
}

First check that the array is large enough to accomodate the new element (if not grow the array). Then add the element at the end of the array. Growing the array involves calculating a new capacity and copying the existing array into an array of bigger capacity, an expensive operation:

elementData = Arrays.copyOf(elementData, newCapacity);

The more elements in the array the more expensive the copy will get.

To remove an element

Removing is another potentially expensive operation as another array copy is required (unless the element removed happens to be the last element of the array). An element is removed by taking all the elements on its right hand side and copying them in place of where this element used to be.

public E remove(int index) {
rangeCheck(index);

modCount++;
E oldValue = elementData(index);

int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // Let gc do its work

return oldValue;
}

Note the increment of the modCount variable above. This variable is written to by any operation which structurally changes the content of the list (ie. add, remove, clear…). And it is read by methods iterating upon the content of the list to detect wether any structural changes occurred. If that’s the case then the iteration may yield incorrect results and a ConcurrentModificationException is thrown.

In Summary:

An arrayList ideal use case is when the number of structural operations is kept to a minimum, because 1) they dont scale and 2) they might raise an concurrentModification exception.

point 1) can be adressed by swapping an ArrayList for a LinkedList… faster structural operations, but much slower retrievals.

point 2) can be remediated by using a CopyOnWriteArrayList instead. The tradeoff in this case is that any write operations (structurally or otherwise) will involve an array copy and therefore impact performance.

Quick tip – link up tests and tracker issues

How do you associate a unit (or integration) test with the bug tracker issue it intends to fix ?

The usual approach is to use comments in the test code:

//fix issue raised by bug tracker item 2554
public void someTest(){
       ...
}

This works… but can be fairly verbose. A cleaner approach is simply to annotate each test with the relevant tracker issue.

First create the annotation:

import java.lang.annotation.ElementType;
import java.lang.annotation.Target;

@Target(ElementType.METHOD)
public @interface Tracker{
   int value;
}

and then use it to tag each test with the relevant tracker number:

@Tracker(2554)
public void someTest(){
  ...
}

Concurrent != Parallel

It’s easy to mix up concurrency and parallelism. After all both terms relate to the ability to run and progress multiple threads. There’s is a subtle distinction though:

Threads which run concurrently do not always run simultaneously (eg. 2 threads on 1 core)… although to the end user they may well appear to – provided the cpu switches between them quickly enough.

Threads running in parallel do run simultaneously, eg. 2 threads on 2 separate cores.

Why is this important ? because the way objects are named can give a clue as to how they should be used. For instance:

– A ConcurrentModificationException can occur in a single threaded process.

   Map<String, Integer> m  = new HashMap<String, Integer>();
   for (int i =0;i<1000;i++){
      m.put(String.valueOf(i), i) ;
    }
    for (Map.Entry entry:m.entrySet()){
      if (Math.random()<0.5) {
         //pretty much guaranteed to throw a ConcurrentModificationException
         m.clear();
       } 
    }

– The Concurrent CMS garbage collector will run (mostly) concurrently with the application threads. On the other hand the Parallel GC uses multiple threads simultaneously, none of which will run concurrently with the user threads (they effectively stop the application world)

The curious case of the Map key with a unique hashcode

What would happen if a HashMap used a key which always returned the same hashcode ? Answer in images.

The underlying structure of a HashMap is akin to an array, where each entry in the array is a linked list of key/value pairs. The hashcode of the key is used to quickly lookup one of the lists indexed by the array.

When the key produces well distributed hashcodes, each entry in the array points to a list with a small number (ideally just one) of key,value pairs.

Well distributed hashcodes

If on the other hand the key always produces the same hashcode, then all array lookups will return the same list.

Map with unique hashcodes

Our HashMap has turned into a linked list… with the consequence that lookup times are also similar to a linked list O(n) versus O(1) for the original hashMap.

Mining the codebase with bash and sed

Ever wondered what are the most commonly imported Apache classes in your projects ? Here’s a possible way to do so, using bash and sed.

find . -name '*.java' | xargs grep apache | grep import | sed 's/.*\(org\.apache.*;\).*/\1/p' | sort | uniq -c | sort

Explanation of the piped expressions above (by order of appearance)

  • find all java files names recursively, starting from the current location
  • search these files for the lines containing the word apache and the word import
  • print out the text starting with “org.apache” and ending with “;”
  • sort by alphabetical order
  • remove duplicates and prefix each lines with the numbers of occurrences
  • sort once more to order from lowest to highest number of occurrences.

The end result will look something like this:

Screen Shot 2013-02-21 at 13.16.40

Java 8 Lambda : an example

There is no equivalent to c++ function pointers in Java. Which means that it is not possible to pass a method as a parameter to another method (unless reflection is used but I wont go there)… you can pass an interface instead though.

For example – given two different methods add and multiply:


int add(int f1, int f2){
        return f1+f2;
    }

int multiply(int f1, int f2){
        return f1*f2;
    }

…and a method which accepts a Calculator interface as its parameter


interface Calculator {
    public int calculate(int i,int j);
}

void recordCalc(Calculator calculator, int i, int j){
           int result =calculator.calculate(i,j);
           //...store the result on disk
    }

then passing add(i,j) or multiply(i,j) as a parameter to the recordCalc method is done like so:

Calculator add = new Calculator(){
   @Override
   public int calculate(int i, int j) {
      return add(i,j);
   }
} ;

Calculator multiply = new Calculator(){
   @Override
   public int calculate(int i, int j) {
      return multiply(i,j);
   }
} ;

recordCalc(multiply,3,3);
recordCalc(add,5,5);

ie. we create two anonymous classes, each implementing Calculator in their own way, which are then passed to the recordCalc method as an interface. It is a bit ugly there’s no denying it.

Enter stage left: Java 8 with lambda support. A lambda is basically a very compact way to implement anonymous classes. They’re represented using an arrow “->”, like so:

    Calculator add = (i, j) -> add(i,j);
    Calculator multiply = (i, j) -> multiply(i, j);
    recordCalc(multiply, 3, 3);
    recordCalc(add, 5, 5);

or even simpler

   recordCalc((i, j) -> add(i,j), 3, 3);
   recordCalc((i, j) -> multiply(i, j), 5, 5);

The initial 16 lines of code implementation using anonymous classes is now reduced to a 2-liner. Shame it’s another 6 months to wait before Java 8 is officially released.

Finally – if you’re using IntelliJ an anonymous class can be transformed into a lambda with one click of the mouse:

IntelliJ lambda conversion

Code Kata: the Fibonacci sequence

The Fibonacci sequence (named after the Italian mathematician Leonardo Fibonacci) is a sequence of numbers where each number is the sum of the previous two numbers. This lends itself quite well to a recursive approach:


 int fibonacci(int term){
       if (term==0) return 0;
       if (term==1) return 1;
       return fibonacci(term - 1) + fibonacci(term - 2);
   }

pros: clean, concise code, very easy to read.

cons: totally useless for anything else than a very small sequence. The dual recursive calls on the last line are performance killers eg. it takes several thousand calls just to calculate fibonacci(20).

A better (as in faster) solution:

  int fibonacci(int f1, int f2, int term){
        if (term==0) return 0;
        if (term==1) return 1;
        if ( term-->2)
            return fib2(f1 + f2, f1, term);
        else
            return f1+f2;
    }

pros: much quicker. computes fibonacci(2000) in under 400 microseconds on an intel core I5.

cons: calculating a sequence with a term greater than 10,000 is pretty much guaranteed to trigger a stack overflow error. This is because in the absence of tail call optimization each recursive method call involves allocating a new stack frame, finally exceeding the jvm stack size.

This leads to a third implementation where the logic is changed to “flatten” the recursive calls into a while loop like so:

    static int fibonacci(int f1, int f2, int term){
        if (term==0) return 0;
        if (term==1) return 1;
        while (term-->2) {
            int tmp = f1;
            f1 = f1  +f2;
            f2 = tmp;
        }
        return f1;
    }

The code doesn’t quite flow as nicely as the first recursive implementation – but more importantly – it wont trigger a stack overflow and it’s significantly faster than any of the two recursive functions eg. fibonacci(2000) is calculated within around 100 micros.

Measuring elapsed time with the Apache StopWatch

When it comes to measuring elapsed time often the code written will be directly reliant on System.currentTimeMillis (or System.nanoTime), eg. fairly similar to this:

long startTime  = System.currentTimeMillis();
//processing here...
long endTime = System.currentTimeMillis();
long elapsedTime = endTime-startTime;

works ok … but it’s a bit verbose. Also to get the time in other units than milliseconds will involve writing additional logic to translate the elapsed time into seconds, minutes, hours…

The Apache Commons library StopWatch will save writing all of this additional code. Less code means less bugs.

Stopwatch sw=new Stopwatch();
sw.start();
//processing here...
sw.stop();
System.out.println(sw.elapsedTime(TimeUnit.MICROSECONDS));

Programming pearl of the day (not)

Extract from live production code…


boolean flag = false;
boolean and = false;
boolean or = false;
....
public void and(){
   and = true;
   or = false;
}

....

public void or() {
   and = false;
   or = true;
}

“and” and “or” are not reserved keywords in Java so they can be used as variable names. Is this a good idea though ? The answer a few lines below:



if ( or || (and || flag ) || (!and && !or) ){
   //some logic here
}
else
if (and || (!and && !or)){
   //more logic here

}

I’ll leave you to decide if it’s genius or madness. Although there’s probably a lesson to be learn somewhere about the importance of good variables names.

Testing mock arguments with Mockito

Mockito uses equals() to compare arguments by default, as in:

verify(myMock).handle("an argument));

This works well when comparing Strings, Integer…etc, but breaks down when the argument is an instance of a class which does not override the Object’s equals() method, such as a Throwable:

//the class to mock
class ExceptionHandler{
   void handle(Throwable t){
     ...
   }
}

public class MyApp {

   private final ExceptionHandler exceptionHandler ;

   MyApp (final ExceptionHandler exceptionHandler){
      this.exceptionHandler = exceptionHandler;
   }

   public void runLogic(int param){
         exceptionHandler.handle(new Throwable(String.format("param is %d", param)));
}
}

Two possible solutions in that case:

First method – using an argument captor

 ExceptionHandler mockExceptionHandler = mock(ExceptionHandler.class);
 ArgumentCaptor<Throwable> captor = ArgumentCaptor.forClass(Throwable.class);
 new MyApp(mockExceptionHandler).runLogic(12);
 verify(mockExceptionHandler).handle(captor.capture());
 Throwable throwable = captor.getValue();
 assertThat(throwable.getMessage(),is("param is 12"));

Second method- using a custom matcher

class ThrowableMatcher extends ArgumentMatcher<Throwable> {
   private String message;

   ThrowableMatcher (String message){
      this.message = message;
   }

   @Override
   public boolean matches(Object o) {
     return ((Throwable)o).getMessage().equals(message);
   }
}

...
 ExceptionHandler mockExceptionHandler = mock(ExceptionHandler.class);
 new MyApp(mockExceptionHandler).runLogic(12);
 verify(mockExceptionHandler).handle(argThat(new ThrowableMatcher("param is 12")));

Conciseness challenge

This post berates Java for being overly verbose compared to Scala. At times (most of the time ?) this is justified, however two examples in particular stand out, which I could not resist try and improve upon:

1. Loop Syntax

The “long” version from the post above:

       ArrayList<Integer> x = new ArrayList<Integer>(3);
        x.add(10);
        x.add(11);
        x.add(12);

        for (Integer y:x)
        {
            if ( y > 2 )
                System.out.println(y);
        }

…this can be easily reduced to a couple of lines (using Guava)

 for(Integer y: newArrayList(10,11,12))  {
        if ( y > 2 ) System.out.println(y);
    }

2. Checking for argument correctness in a constructor.

The original version:

class Person
{
    private String name;
    private int age;

    public Person(String n, int a)
    {
        if ( age < 18 ) {
            throw new IllegalArgumentException();
            }
        name = n;
        age = a;
    }
}

Using the Preconditions api from the Guava libraries and the AllArgsConstructor annotation from project Lombok this turns into a slightly more compact version:

@AllArgsConstructor
class Person2
{
    private  String name;
    private  int age;

    static Person aPerson(String name, int age){
       checkArgument(age>=18);
       return new Person2(name,age);
    }
}

Still not quite as terse as the equivalent code in Scala, but gets to the point whilst remaining easily readable.

Loading resources from the classpath in Java – the concise way

Loading content from the classpath (such as loading the content of a file into a string) is a fairly common task but still – there’s no api in the java 6 sdk which provides an easy/concise way to do it….(as far as I know, happy to be proven wrong).

The alternative is to use a little help from the Apache commons-io library. In 5 lines of code:

import org.apache.commons.io.IOUtils;
import java.io.IOException;
import java.net.URL;

....

public static String contentFromClasspath(String name, Class clasz) throws IOException{
   URL resource = clasz.getResource(name);
   if (resource == null) {
      throw new IllegalArgumentException(String.format("Error opening %s from %s", name, clasz));
   }
   return IOUtils.toString(resource.openStream());
}

That’s assuming the resource opened is read-only, so there’s no need to close the inputstream, the OS will do it for us when the program exits.

Use it like so:


String xml = contentFromClasspath("sample.txt",this.getClass()) ;

—-

Update:

For Java 6 the above is the most concise way to load content from the classpath – but from Java 7 onwards we can do better with the java.nio.Files api.


import static java.nio.charset.Charset.defaultCharset;
import static java.nio.file.FileSystems.getDefault;
import static java.nio.file.Files.readAllLines;

...

String path = this.getClass().getProtectionDomain().getCodeSource().getLocation().getPath();
System.out.println(readAllLines(getDefault().getPath(path, "sample.txt"), defaultCharset()));

2 lines (ok they’re a bit long… but still).  And the readAllLines method ensure the file is closed once read… who said java was verbose ?

Three apps for rooted android devices

I installed Cyanogenmod (version 7)  on a motorola atrix about 3 weeks ago now and I’m pleased to report it’s been rock-solid since. No stability issues whatsoever. No loss of functionality either – quite the opposite actually, because some android apps require superusers rights, so they can only work on rooted devices.

An in particular three of these apps stand out.

AdFree: removes ads in the browser and android apps. (Adblock+ is a possible alternative, havent tried it)

screenshot (7)

Adb wireless: enables wireless adb connections. Useful for android devs.

screenshot (10)

AirDroid : manage your phone wirelessly from a browser on a remote machine. Works on non-rooted phones – but requires root access for all functions eg. taking screenshots (like the screenshots illustrating this post).

screenshot (6)

Cyanogenmod readiness test

Are you:

– tired of being stuck with Android 2.x because your phone manufacturer decided it was too much hassle to upgrade to the latest version of Android ? Motorola I’m looking at you.

– annoyed by the useless customizations built on top of the stock firmware from Google ?

– proficient enough to install the android sdk and fire up some basic commands like adb ?

– brave enough to void your warranty (for sure) and risk losing your phone (a small chance if you follow the instructions laid out below, but still… a possibility)

if the answers to all the above is a resounding yes – then you should consider upgrading to Cyanogenmod, like  three million users (and counting). The procedure to do so is fairly simple, head over to this page and follow the instructions specific to your phone. I recommend sticking to a stable distribution of Cyanogenmod when upgrading. Also if you’re new to this it’s probably best to try it with an out-of-warranty phone first.

I’ve followed the steps above with an Atrix 4G and it took about one hour to install Cyanogenmod (as a total beginner to ROM flashing). One minor niggle: for a while I could not get my phone to boot into Clockwork recovery mode – turns out it is necessary to reboot into recovery mode straight after having flashed ClockworkMod for the modification to stick. If the phone reboots normally instead the recovery from stock android takes over.

The end result:

– on the plus side: much improved battery life and response time. All the useless pre-installed apps from the phone vendor are gone (this in itself makes the upgrade worth it).

– on the minus side: the latest stable version of Cyanogenmod for Atrix  doesnt ship with Android 4 – yet (although some of the latest unofficial ROMs do).

First steps with… Cucumber JVM

Cucumber is a tool used to support behaviour driven development. Originally written in Ruby there is now a JVM version, called, quite logically, Cucumber-JVM.

The basic idea (very summarized…) is to write acceptance tests for a new feature, together with the product owner, before the code is written. Then run the tests, see the tests fail, implement the missing behaviour, re-run the tests etc.. until the tests pass.

The key objective here is to involve the product owners as much as possible in writing the tests, which from experience can be tricky as they do not generally have a technical background. So it’s important for an acceptance tests framework to generate tests with a syntax which is a close as possible to a natural language.

Cucumber achieves this quite well, see below for an example Cucumber script (click to zoom in)

Cucumber_scripts
The right panel defines the tests scripts to execute, easily understandable by non-technical people. No messing around with HTML either, a big win compared with alternative frameworks such as Fitnesse or Concordion.

The left panel maps the tests scripts to their associated junit tests. Full code source for this example is at: https://github.com/eleco/bdd

Cucumber outputs the tests results in a nicely-formatted page like so.

Cucumber_report

Cucumber_test_results_ko

Testing time-dependent logic in Java

It’s astounding. Time is… fleeting (The Rocky Horror Picture Show)

Testing time-sensitive business logic is essentially about being able to change the current time in our tests – somehow – and then checking how this affects the behaviour of the domain object being tested.

The primitive and brute-force way to do this is to manipulate the computer system clock by manually changing the current time prior to each test…  Crucially this approach does not lend itself to running as part of an automated test suite, for obvious reasons.
The other (better) way is use two different clocks: the production code can rely on the system clock while the tests code will depend on a custom clock, i.e is a clock which can be setup to return any particular time as the current time. Usually this custom clock will expose methods to advance/rewind the clock to specific points in time.
interface clock{
   public DateTime now();
}

class SystemClock implements Clock{
   public DateTime now(){
      return new DateTime();
   }
}

class CustomClock implements Clock{
   private DateTime now;
   public CustomClock(DateTime now){
      this.now = now;
   }
   public DateTime now(){
      return now;
   }
   public void tick(){
      now = now.plusDays(1);
   }
}

Both clocks realize the “now()” method defined in the Clock interface. The difference being that the now() method from SystemClock is a simple wrapper around a new instance of a Joda dateTime instance, while the now() method from the CustomClock returns a dateTime attribute which can be modified through the tick() method to make time pass faster :)The custom clock will be injected as a dependency of the testing code and the system clock as a dependency of the production code.

For a (somewhat contrived) example of how this plays out check out: https://github.com/eleco/customclock

Thread.sleep: avoid

Too often Thread.sleep is used to make the main application thread pause when it needs to wait for resources to be initialised on a secondary thread, like so:

while (resourceNotInitialized){
Thread.sleep (someArbitraryNumberOfMilliSeconds);
}

While this solution has the advantage of being easy to understand, it also has one drawback:  the number of milliseconds the main thread must sleep for is most likely wrong:

– either it’s too small and the main thread will repeatedly awake too early, hogging CPU resources in the process (buzy wait)
– or it’s too large and the main thread will wake up long after all resources have been initialised, resulting in an application with sluggish behaviour (and irate users).

The proper way to handle this scenario is to coordinate both threads either by using the wait and notify methods from the Object class, or alternatively with a countdownLatch.

 

Top 10 Java interview questions

The 10 most frequently-asked Java interview questions, in my experience.
Mostly used in phone interviews to “weed out” the weakest candidates.

These questions are fairly simple so getting them wrong will raise a major red flag.

In no particular order:

– How to prevent concurrent access to a method

– Meaning of the volatile keyword

– Difference String and StringBuffer

– Difference between ArrayList and Vector

– Relationship between equals and hashcode

– Difference between checked and unchecked exceptions

– Meaning of the final keyword

– Explain garbage collection, can it be forced

– Difference between an interface and a abstract class

– what’s a deadlock

Log4j code snippets

A few log4j-related code snippets that I tend to re-use from time to time… All pretty self-explanatory.

There is a faq at http://logging.apache.org/log4j/ which already touches on the subjects below,
but in a fairly light manner, and without the support of any code.


1- how to change the log level dynamically



        Level debugLogLevel = Level.toLevel("DEBUG");
        Logger.getLogger(MyClass.class).setLevel(debugLogLevel);


2- add an appender at runtime

ConsoleAppender appender = new ConsoleAppender( new PatternLayout("%-5p [%t]: %m%n"));
Logger.getRootLogger().addAppender(appender);


3- how to reload the log4j configuration file at runtime

String configDir ="/path/to/config/directory";
LogManager.resetConfiguration();
String log4jConfigFile = configDir + java.io.File.separator +"log4j.xml";
DOMConfigurator.configure(log4jConfigFile);
logger.info("log4j initialized from " + log4jConfigFile);



4- how to direct the log output to different files.

In the configuration below the log statements originating from the “com.firstpackage” package
will be directed to FirstFile.log, and the also to the console.
Log statements from “com.secondpackage” will go to SecondFile.log, and to the console.


 <appender name="FIRST_FILE" class="org.apache.log4j.DailyRollingFileAppender">
       <param name="File" value="FirstFile.log"/>
       <layout class="org.apache.log4j.PatternLayout">
       <param name="ConversionPattern" value="%d{HH:mm:ss,SSS} [%t] %-5p %c{1}: %m%n"/>
       </layout>
    </appender>

 <appender name="SECOND_FILE" class="org.apache.log4j.DailyRollingFileAppender">
       <param name="File" value="SecondFile.log"/>
       <layout class="org.apache.log4j.PatternLayout">
       <param name="ConversionPattern" value="%d{HH:mm:ss,SSS} [%t] %-5p %c{1}: %m%n"/>
       </layout>
    </appender>

 <appender name="CONSOLE_APPENDER" class="org.apache.log4j.ConsoleAppender">
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="%d{HH:mm:ss,SSS} [%t] %-5p %c{1}: %m%n"/>
        </layout>
    </appender>

<appender name="ASYNC_APPENDER_1" class="org.apache.log4j.AsyncAppender">
        <appender-ref ref="CONSOLE_APPENDER"/>
        <appender-ref ref="FIRST_FILE"/>
    </appender>
   
    <appender name="ASYNC_APPENDER_2" class="org.apache.log4j.AsyncAppender">
        <appender-ref ref="CONSOLE_APPENDER"/>
        <appender-ref ref="SECOND_FILE"/>
    </appender>

 <logger name="com.firstpackage" additivity="false">
        <level value="DEBUG" />
        <appender-ref ref="ASYNC_APPENDER_1" />       
    </logger>
   
    <logger name="com.secondpackage" additivity="false">
        <level value="DEBUG" />
        <appender-ref ref="ASYNC_APPENDER_2" />       
    </logger>


First steps with Awk

Awk is a Unix programming language specifically dedicated to the processing of text files.

While it’s been around for ages (it’s almost as old as Unix) it remains fairly unknown and/or unused compared to other utilities such as grep, vi, find… Strange really as it’s powerful and quite easy to use.

Example case: looking up and summarizing data in a log file.

Imagine the “usual” application log file of the form: [Date] [Thread #] [Log level] [log message]

Say some of the lines logged in there account for the time spent on one given algorithm:
13:42:07,019 [Thread-1] DEBUG Calculation: time spent on algo #12 is 831 ms

It would be interesting to parse all the lines containing the word algo to extract 1) the total time spent on algo calculation 2) the average time spent across all calculations.

In Java,  coding such a parser from scratch would take about one full day for most developers (if not more)
– locate file, open and read , manage io exceptions
– parse lines (manage parsing exceptions)
– calculate results and print
– create a build script

By comparison the equivalent script can be written with Awk in minutes.

proceeding step by step

Step 1.
to print to screen all lines containing the term algo in the file myfile.log

awk ‘/algo/ {;print $0}’ myfile.log

Note: Awk divides each line in columns, where a column is a block of text separated by whitespaces.
Eg. in the format above [Date] [Thread] [Debug]… $0 will print the whole line, $1 will print the date, $2 the thread number ..etc…

Step2.
To sum the number of lines with the term algo:

awk ‘/algo/ {nb++} END {printf “%d”,nb}’ myfile.log

[here nb is a local variable declared on the fly and used as a line counter]

Step3.
To keep a running total of the time spent on algo calculations:

awk ‘/algo/ {total+=$9} END {printf “%d”,total}’ myfile.log

[assuming the time spent on a calculation is printed on the 9th column of the line]

Step4.
putting it all together – print the total time spent + average on each calculation:

awk ‘/algo/ {nb++;total+= $9} END {printf “%d %d”, total, total/nb}’ myfile.log



The best way to learn more about Awk is to try it….

– Awk should come as a standard with all linux distributions.

– On Windows it’s available via Cygwin. An alternative implementation, gawk, (for Gnu Awk) can also be found here.

How to copy a file in Java

Even the most basic of task – like copying a file – can be done in several different ways in Java…
a testament to the richness of the platform (or its complexity !).

Four possible ways to copy a file below, all error handling left aside.

1- the complicated way, manipulating io streams.

   import java.io.FileInputStream;
   import java.io.FileOutputStream;
   import java.io.InputStream;
   import java.io.OutputStream;
   ....

   String orig ="file.xml";
   String dest = "file.xml.bak";
   InputStream in = new FileInputStream(orig);
   OutputStream out = new FileOutputStream(dest);
   byte[] buf = new byte[1024];
   int len;
   while ((len = in.read(buf)) > 0) {
      out.write(buf, 0, len);
   }
   in.close();
   out.close(); 

2- The ugly , non-portable way, using the operating system environment:


   Runtime.getRuntime().exec("OS dependent file copy command here"); 

3- The opensource way, using the Apache Commons library.


   import java.file.io;
   import org.apache.commons.io.FileUtils;
   ....
   String orig ="file.xml";
   String dest = "file.xml.bak";
   File fOrig = new File(orig);
   File fDest = new File(dest);
   FileUtils.copyFile(fOrig, fDest);

4- The novel way- using Java 7 and it’s revamped I/O api.

   
    import java.io.File;
    import java.nio.file.Path;
    ...
    String orig ="file.xml";
    String dest = "file.xml.bak";
    File f = new File (orig);
    Path p = f.toPath();
    p.copyTo(new File (dest).toPath(), REPLACE_EXISTING, COPY_ATTRIBUTES);

Dependency graph in Netbeans 6.8

It seems that I discover something new in Netbeans every day… The “Show dependency graph” associated with every Maven projects in Netbeans 6.8 had escaped me so far.

This menu option (accessible by right-clicking on a Maven project in the “projects” window) generates a graph of all the projects’ dependencies  (libraries declared in the pom and their transitive dependencies).

Example below of the dependency graph  of the libraries required by a project called, rather imaginatively, mavenproject1.

The graph will be rather hard to read for a large number of dependencies, but it’s nice to have nevertheless.

Debugging classpath issues

There’s nothing more frustrating than wasting time figuring out why some resources (e.g. configuration files for log4j, hibernate…) are not loaded correctly from the classpath.

The few lines of code below help narrow down these kind of issues. Knowing what’s the classpath at runtime and being able to test if it covers a specific file is half the battle won already.

1- Print out name of all files on the classpath

String classpath = java.lang.System.getProperty( "java.class.path" );
for (String path : classpath.split(System.getProperty("path.separator"))){
   File f = new File (path);
   String resource = (f.isDirectory()?Arrays.asList( f.list()).toString():f.toString());
   System.out.println (resource);
}

2- Check wether a specific file is on the classpath


String myResource = .... ;
InputStream is = getClass().getResourceAsStream(myResource);
System.out.println (myResource + " is " +  (is==null?"not":"") + " on the classpath");

(almost) painless profiling with TPTP 4.6.1 and Eclipse

The TPTP plugin for Eclipse in its versions 4.3, 4.4 and 4.5 was fairly bug prone – to the point of being barely usable. In particular it had a frustrating tendency to lock up the entire IDE…The latest version v.4.6.1  is more stable, even though setting it up still requires quite a lot of work… I only put up with it (just) because I prefer using Eclipse over Netbeans as my main development tool.

Summary of the steps involved below (tested with Eclipse 3.5, TPTP 4.6.1, running on Windows XP).

1.) Download TPTP  4.6.1 from within the comfort of your Eclipse IDE.
Replace 4.6.0 in the page linked above by 4.6.1… the documentation is not quite up to date:/
Once the install is over and Eclipse is restarted a new “Profiling and logging” perspective should be available in the main menu.

2.) Download the agent controller separately
In theory this step is not really needed, as TPTP now contains its own  Integrated Agent Controller (IAC).
In practice the standalone agent is more stable than the IAC.

3.) Unzip the agent controller on your local drive.

4.) Add the profiler DLLs to your path. From the command line:

Set TPTP_AC_HOME=<path to your local agent controller installation>
set JAVA_PROFILER_HOME=%TPTP_AC_HOME%\plugins\org.eclipse.tptp.javaprofiler
Set PATH=%JAVA_PROFILER_HOME%;%PATH%;%TPTP_AC_HOME%\bin

[more details on setting up the path here , check out section 3.3]

5.) Create a file called filters.txt (for example) where you’ll specify the classes which needs to be profiled.

Content of the file=
com.myclasses* * INCLUDE
* * EXCLUDE

This will profile all the methods of all classes in the com.myclasses packages.
Note that filtering is essential if you dont want to end up with hideously large profiling files.

6.) Run the application to be profiled
Add the following to the Java command line used to run the app so that all execution details are collected:
-agentlib:JPIBootLoader=JPIAgent:server=standalone,filters=filters.txt;CGProf:execdetails=true;

If all goes well a file name called trace.trcxml (by default) will start collecting the profiling info from the current directory.

7.) Open the profiling view in Eclipse and import the trace file generated
(a popup menu will appear where you can select additional filters and specific statistics to be run on the trace file)
Be prepared to wait if you didnt specify a broad enough set of filters in step 5)

8.) Once import is finished right click on the profiling file and open it with the appropriate editor
eg. use ExecutionStatistics and ExecutionFlow if profiling run with execdetails=true


Caveat: Profiling done that way doesnt give a realtime feedback on the behaviour of the app being profiled
(the trace.trcxml file generated needs to be fed into Eclipse repeatedly for up to date results).

On the plus side this method works for all kind of processes, remote or local, libraries or main programs.

If all else fails there’s always the Netbeans’s profiler, which is  very good, and free, or Yourkit (www.yourkit.com),which is excellent(but not free…)  Another alternative is VisualVm , which offers both profiling and sampling capabilities.

All these tools work pretty much out of the box, in stark contract with TPTP.

Google Collections’ computing map

“On demand computing”, as provided by Google Collections, turns a map into a computing map, which can be used as a basic cache for rare (but expensive) lookups.

Although the javadoc for this functionality is a good start, it can be opaque at times, especially for developers not entirely accustomed with the functional style of programming prevalent in the library.

The code below puts a computing map into context.

1. Computing map declaration.

//a ConcurrentMap... 
ConcurrentMap<Key, Value> computingMap =   
// enhanced to support soft/weak keys/values, timed expiration and...
        .new MapMaker()  
//on-demand computation...
        .makeComputingMap(   
//passing into parameter an anonymous instance implementing the Function interface... 
        new Function<Key, Value>() { 
//where the function transforming a key into a value is defined so that....   
        public Value apply(Key key) {  
//it delegates to the a specialized, computationaly expensive function.
         return createExpensiveValue(key);  
       } 

2. Computationaly expensive function used to generate a value from a key.

  Value createExpensiveValue(Key key){ 
               Value computedValue = null; 
                /*transform the input key into the computedValue here 
                ... 
                */ 
             return computedValue; 
     } 

3. Retrieving values from the map

void retrieveValuesFromComputingMap() { 
        Key k = new Key(...); 
        //the map generates a value from the key and stores it.
        Value v =computingMap.get(k);   
        //subsequent calls to retrieve the value associated with the key will fetch it directly from the map, 
       // skipping any other computation
} 

ThreadLocal – an overview

Definition

As can be inferred from its name, a ThreadLocal class (javadoc here) provides thread-local variables,  ie variables for which each thread has its own independent copy. ThreadLocals are fairly prevalent in technical frameworks as a means of storing transactions and security contexts on a per-thread/per-request basis:

-> ORM frameworks such as Hibernate and Ibatis use it to bind each thread to a session.

-> A search trough the Koders website for ThreadLocal usages in Java projects returns more than 15000 hits…

Inner workings

Get/Set operations on a ThreadLocal instance respectively read/write to a HashMap of {ThreadLocal instance , Object value} key/value pairs, where the HashMap instance accessed is function of the current thread. Therefore a particular ThreadLocal instance could end up being associated with different object values,  according to which thread is current (and which HashMap is being looked up).

Data structures involved

Thread
– Holds a reference to an instance of ThreadLocalMap.

ThreadLocal
– Defines ThreadLocalMap as a static nested class.
– Declares a custom hashcode, used as a key to locate entries in the ThreadLocalMap

ThreadLocalMap
– Maps  ThreadLocals to object values.

Operations

Creating a new instance of ThreadLocal:

ThreadLocal threadLocal = new ThreadLocal();

ThreadLocal  object created, and associated with a newly generated ThreadLocal custom hashcode (used to search the ThreadLocalMap in constant time).

Setting a ThreadLocal instance to a specific Object value:

String aString="ThreadLocalTest";
threadLocal.set(aString);

– threadLocal retrieves the ThreadLocalMap referenced from the current thread.
– An entry for the {threadLocal,aString} pair is inserted into the ThreadLocalMap retrieved.

Getting an object from a ThreadLocal instance:

//outputs the value which has previously been set to threadLocal by the current thread...
//assuming we're on the same thread throughout this example, output value will be "ThreadLocalTest"
System.out.println (threadLocal.get());

– threadLocal retrieves the TreadLocalMap referenced from the current Thread.
– The ThreadLocalMap just retrieved returns the entry mapping to threadLocal.
– The Object value (“ThreadLocalTest” String) is extracted from the entry and returned to ThreadLocal.

Things I like about JavaFX: built-in animation support

Animating graphics is a rather protracted process with “traditional” languages (java, c++, c#…).

The logic needed to animate each object will involve (at minimum):

– Saving the old position of the object
– calculating the new position, function of time
– calling a graphics routine to erase object at old position
– calling a graphics routine to re-draw the object at new position

… this can easily lead to a fair amount of code, and complexity (and therefore potential bugs).

JavaFX simplifies the whole process significantly.

1- The bind keyword keeps data model and graphics interface synchronized at all time. Therefore no need to explicitly update the GUI when the underlying object position changes, JavaFX handles this for you. The snippet below draws a circle whose position will be updated whenever there’s a change in value of the fields x or y of the underlying object.

 class MyNode extends CustomNode 
{ 
var x:Number; 
var y:Number; 

public override function create():Node { 
translateX: bind x; 
translateY: bind y; 
content: [Circle {radius:10}] 
} 
} 

2- The Timeline class is used to specify the position of the animated object on screen at specific times. It’s as straightforward as specifying where the object should draw at specific intervals. JavaFX will interpolate between the specified times and positions, therefore no need to explicitly compute the object position, again it’s all done for you. The Timeline below will update the y field to be 70 at time t=0, 100 at time t=5. The values inbetween (times t=1,2,3,4…) will be interpolated.

public var timeline:Timeline = Timeline {
keyFrames: [
at(0s) {y=&gt;70},
at(5s) {y=&gt;100}
}

Just the two features described above (there is much, much more of course) make animation a breeze, and, dare I say, fun !

Protocol piggybacking (Java-centric)



The main application protocols relevant to the Java ecosystem, and how they can be setup to maximise interoperability:

Network protocols

The network stack is organized in layers (as defined in the OSI model), with each layer reusing the services of the layer immediately below.

From bottom to top, the main layers and protocols associated:

– Network protocol layer (#3 in the OSI model): IP

– Transport protocol layer (#4 in the OSI model): TCP, UDP. The tradeoff here is performance (UDP) vs relability (TCP).
Due to its inherent unreliability UDP is mostly used for video, gaming, chats, etc…
Note that it is possible to introduce some reliability on top of UDP, e.g. RUDP.  TIBCO RV works on that principle.

– Application protocol layer (#7, topmost, in the OSI model): HTTP, HTTPS, FTP, SMTP, JDBC, RMI, SOAP…
(all on TCP due to the aforementioned reliability requirements)

Note that JMS being a pure API doesnt quite fit anywhere in the above representation…

Piggybacking

Where it gets interesting is how these protocols can be combined to deliver enhanced functionalities:

– Pretty much everything can be tunnelled onto HTTP: RMI/HTTP, JDBC/HTTP, SOAP/HTTP…
The idea here is to piggyback on HTTP to take advantage of its ubiquity and firewall-friendly properties.

– If Http not available, it is still possible to tunnel through SMTP  (although that would only suit asynchronous communications)

– JDBC/RMI:  access JDBC databases via RMI (and/or ODBC databases via the JDBC/ODBC bridge, via RMI).  See  http://rmijdbc.ow2.org/

– JDBC/SOAP:  In theory Java clients could use a SOAP<->JDBC bridge to access databases via SOAP/HTTP. Not sure if any such bridge exists.

– SOAP/JMS: improved reliability, guaranteed delivery of SOAP messages.

– RMI / JMS: reuse RMI programming model but with added benefits provided by JMS (reliability,  asynchronousity)

– SOAP/RMI:  using an existing RMI infrastructure to transport SOAP messages (will negatively affect bandwidth).

Windows Task Manager – Quick tip: showing the PID

By default the Windows Task Manager does not show the PID of a process… but this can easily be fixed:

1. Start the Windows task manager. This can be done either via the command line by running  “taskmgr”, or by right-clicking on the Windows taskbar and selecting the “Start Task Manager” option. You should see something like this:

Capture-tm

2. Now click on the “Processes” tab,  select the “View” menu and pick the “Select Columns” option. Ensure that the PID checkbox is ticked.

Capture-tm_select_columns

3. Click OK to go back to the processes tab, and the PID should be visible:

Capture-tm_pid

The PID can be used in conjunction with the netstat command to find out which process runs on a given port:

netstat -aon | find “1234” will show which PID is associated with port 1234, then use the Task Manager to lookup the process associated with that PID.

Netbeans profiling

Netbeans 6.5 has a very useful Profiler which helps a lot in pinpointing exactly which method is the culprit when your app comes to a crawl: 

 

– In Netbeans pick  Menu Profile -> Profile Main Project to active the profiler (which I believe is built on top of VisualVM).

 

– Rerun the app and replay the actions which caused it to freeze

 

– at that point move on to the Netbeans “Profiler” panel and take a snapshot of the threads call tree. 

 

– Sorting the call tree by time descending should make any bottleneck immediatly obvious.

 

Example:  below is the call tree associated with a Java app I’m currently working on. 

nb profile all threads

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The refreshTicksHorizontal method (part of the JFreeChart API) seems to take a suspisciously long time to execute… 

Poking around the JFreeChart forums it became soon clear why

JNLP-enabling an application with NetBeans

NetBeans (version 6.5) makes it very easy to setup an existing Swing application for delivery via Java Web Start:

  1. – In the “Projects” view, right-click on the project to be JNPL-enabled.
  2. – Select Application -> Web Start
  3. – Tick ‘Enable Web Start’ and ‘Self-signed’

That’s it. A clean build will now produce a .jnlp file in the project’s dist directory, and if all goes well the next run will now invoke the Web Start mechanism.


With this configuration NetBeans will use a different certificate for each project when signing the associated jars. Which is fine.. until there’s a need to reuse a common set of jars from two different projects. This will cause Web Start to fail because: “JAR resources in JNLP file are not signed by same certificate”.

One possible solution is to use the extensions mechanism built into JNLP. See here and there for more details.

The other is to create your own certificate and use it to sign all jars.

– Install the NetBeans keystore plugin, part of the mobility pack module (Tools->Plugins-> pick the mobility module and restart NB).

– Activate the keystores manager (Tools-> Keystores)

– Add a new keystore:

keystore

– Create a new key Pair alias for that keystore:

keypair_nb1

-Finally, dont forget to  setup your project to use the new keystore.

Mockito in a nutshell

What is it
As the name implies – it’s a mock framework.
ie. allows for the creation of mock objects, to be used in place of “real” objects (often external dependencies such as databases, JMS servers..) when unit testing.

What’s good about it
The main features are listed on the Mockito website.
Two features stand out:
– ability to mock classes as well as interfaces
– lean API which makes for a  shorter learning curve and more readable code when compared with existing mock frameworks such as easyMock, JMockit…

How does it work

Class under Test:

Class BusinessLogic  {

   Public void execute (ExternalSystem externalSystem) {

     //retrieve a value from an external system:
     //could be slow, unreliable
     String value = externalSystem.fetch();
   }
 }

Unit-Test

class BusinessLogicTest {

   public void testExecute(){
      //Initialise mock system
      ExternalSystem mockSystem = mock (ExternalSystem.class)

      //make sure the fetch method of the mocked ExternalSystem will return "ABC"
      //(only needed if that value is critical to the test)
      stub(mockSystem.fetch()).toReturn("ABC");

      //exercise the class under test
      //will call the mocked system initialized above
      new BusinessLogic().execute();

      //check mock system has been called once
      verify (mockSystem, times(1)).fetch()
    }
}

Bugfixing: “Current operation not available” in Eclipse

Symptoms:  The following error message pops up in Eclipse “Current operation not available” while testing a new plugin (via context menu “Run as-> Eclipse Application”)

Diagnostic: Checking the Eclipse logs in eclipse\runtime-EclipseApplication\.metadata\.log shows:

!ENTRY org.eclipse.equinox.registry 4 1 2008-03-26 10:34:13.953
!MESSAGE Plug-in “Plugin” was unable to instantiate class “popup.actions.PluginAction”.
!STACK 0
java.lang.NoClassDefFoundError: org/eclipse/jdt/core/JavaModelException
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Unknown Source)
at java.lang.Class.getConstructor0(Unknown Source)
at java.lang.Class.newInstance0(Unknown Source)
at java.lang.Class.newInstance(Unknown Source)

The plugin cannot find a class referenced in its source code (in this case  org.eclipse.jdt.core.JavaModelException) and then fails to load.

Fix: edit the META-INF/MANIFEST.FM file associated with the plugin and add the necessary package under “Require-Bundle” (eg. org.eclipse.jdt.core).

Avoiding deadlocks with Reentrantlock

This post showed how multiple threads synchronizing on two resources in different order usually leads to a deadlock. An obvious solution is to arrange for all threads to acquire locks in the same order.

Alternatively the synchronized blocks can be replaced by reentrant locks. The tryLock method of the ReentrantLock class is used to acquire a lock, returning immediatly if that lock is held by another thread (and not blocking forever as would be the case when using synchronized).

Example follows – livelock issues not dealt with at this point (ie. both threads, whilst not being blocked, may still not get any job done as they keep colliding when trying to acquire locks).

import java.util.concurrent.locks.ReentrantLock;

public class LockAcquisition {

	public static void main (String args[]){

		final ReentrantLock lockA = new ReentrantLock();
		final ReentrantLock lockB = new ReentrantLock();

		new Thread() {
			public void run (){
				while (1==1) {
					try {
						System.out.println (this + " acquiring lockA");
						if (lockA.tryLock()) {
							System.out.println (this + " acquired lockA");
							System.out.println (this + " acquiring lockB");
							if (lockB.tryLock()){
								System.out.println (this + " acquired lockB");
							}
						}
					}
					finally {
						if (lockB.isHeldByCurrentThread()) lockB.unlock();
						if (lockA.isHeldByCurrentThread()) lockA.unlock();
					}
				}
			}
		}.start();

		new Thread() {
			public void run (){
				while (1==1) {
					try {
						System.out.println (this + " acquiring lockB");
						if (lockB.tryLock());
							System.out.println (this + " acquired lockB");
							System.out.println (this + " acquiring lockA");
							if (lockA.tryLock()){
								System.out.println (this + " acquired lockA");
							}
					}
					finally {
						if (lockA.isHeldByCurrentThread()) lockA.unlock();
						if (lockB.isHeldByCurrentThread()) lockB.unlock();

					}
				}
			}
		}.start();
	}
}

Thread stack traces with java 6 and jstack

 

Java 6 on Windows now supports the <jstack> utility as a mean to print stack traces (instead of relying solely on CTRL+BREAK).

Quick demo follows.


1) Write a program which is (pretty much) guaranteed to deadlock


public class Deadlock {

	public static void main (String args[]){

		final Object objA = new Object();

		final Object objB = new Object();

		new Thread() {
 		public void run (){
 			while (1==1) {
 				System.out.println ("Thread A synchronizing on objA");
 				synchronized (objA){
 					System.out.println ("Thread A synchronized on objA");
 					System.out.println ("Thread A synchronizing on objB");
 					synchronized (objB){
 						System.out.println ("Thread A synchronized on objB");
 					}
 				}
 			}
 		}
 	}.start();

		new Thread() {
 		public void run (){
 			while (1==1) {
 				System.out.println ("Thread B synchronizing on objB");
 				synchronized (objB){
 					System.out.println ("Thread B synchronized on objB");
 					System.out.println ("Thread B synchronizing on objA");
 					synchronized (objA){
 						System.out.println ("Thread B synchronized on objA");
 					}
 				}
 			}
 		}
 	}.start();

	}

}


2) Run the Deadock program

deadlock

 

 


3) Get the deadlocked process id

jps

 


4) Run jstack -> 1 deadlock detected.

 

jstack

 

 

Four stages of object-relational mapping

1) pure JDBC

JDBC code has to be written to manage all database activities: opening (and closing) database connections, mapping parameters to queries and updates, executing statements, fetching query results…

+ fine when working with only a few sql statements

– the amount of code for each sql statement is consequent and often subject to copy-and-paste.
– as always the more code lines written, the bigger the probabilities of generating bugs: in particular it’s not rare for connections to leak (opened and never closed) or for queries parameters to not be associated with the correct values.

2) in-house ORM framework

An in-house framework aims at solving the issues found in stage 1). Typically it performs the following:

+ hides low-level bug-prone JDBC code behind a custom facade.
+ externalise SQL in configuration files, which allow modification of the SQL without recompiling the code and makes for a clearer java code.

– learning curve can be significant as in-house documentation is rarely up to date (when it exists at all)
– the effort required to maintain and document the framework, and keep it up to date with new releases of drivers/ databases / JDBC api is consequent.

3) Ibatis

Ibatis is an open source ORM framework which automatically maps java objects to sql queries using xml
configuration files.

+ (almost) totally eliminates the plumbing code needed for getting data in and out of the database.
+ transaction support
+ declarative caching
+ lazy loading
+ very good documentation available online / offline
+ fairly bug free

– As the number of sql statements grows, so does the complexity of the Ibatis configuration files.

4) Hibernate

All benefits from Ibatis and also:

+ Autogeneration of sql statements
+ Advanced functionalities (Shards)

– steep learning curve
– when using HQL: loss of control over how SQL queries are generated

Note that there’s no hierarchy of values here: Hibernate is not always preferable to raw JDBC (especially on very small projects)
It’s all depending on the context (although I find that Ibatis strikes a nice balance between Hibernate and pure-JDBC code).