Mining the codebase with bash and sed

Ever wondered what are the most commonly imported Apache classes in your projects ? Here’s a possible way to do so, using bash and sed.

find . -name '*.java' | xargs grep apache | grep import | sed 's/.*\(org\.apache.*;\).*/\1/p' | sort | uniq -c | sort

Explanation of the piped expressions above (by order of appearance)

  • find all java files names recursively, starting from the current location
  • search these files for the lines containing the word apache and the word import
  • print out the text starting with “org.apache” and ending with “;”
  • sort by alphabetical order
  • remove duplicates and prefix each lines with the numbers of occurrences
  • sort once more to order from lowest to highest number of occurrences.

The end result will look something like this:

Screen Shot 2013-02-21 at 13.16.40


