Mining the codebase with bash and sed

Ever wondered what are the most commonly imported Apache classes in your projects ? Here’s a possible way to do so, using bash and sed.

find . -name '*.java' | xargs grep apache | grep import | sed 's/.*\(org\.apache.*;\).*/\1/p' | sort | uniq -c | sort

Explanation of the piped expressions above (by order of appearance)

  • find all java files names recursively, starting from the current location
  • search these files for the lines containing the word apache and the word import
  • print out the text starting with “org.apache” and ending with “;”
  • sort by alphabetical order
  • remove duplicates and prefix each lines with the numbers of occurrences
  • sort once more to order from lowest to highest number of occurrences.

The end result will look something like this:

Screen Shot 2013-02-21 at 13.16.40


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s