• UNC Old Well
  • Beach
  • Stonehenge
  • Bath, England
  • Udaipur
  • Blue Ridge Parkway
  • Coastal Carolina University

Refactoring Applications Into a MVC Framework

I spoke yesterday at the Adobe CFSummit on refactoring procedural code into MVC Frameworks. Also did the same presentation at NCDevCon last month. It was a great experience speaking at both of these conferences. Many thanks to the conference organizers as well as all the folks that came to my talks!

Here are the slides and demo code from my talk. Hope you find these useful. Feel free to leave a comment here or send me an email if you have questions.
Slides
Demo Code
(52)

Analysis of Wikimedia Logs for Traffic Load and Popularity using Apache Hadoop

I took a class in Data Center systems this past Spring as a part of my Masters curriculum. My final project for that class was a project on analyzing public Wikimedia logs to determine different traffic load as well as popularity patterns using Apache Hadoop. I had a lot of fun doing this project and it was great learning experience so I figured I’d blog about it.

The goals of this project were:
1.) Perform temporal analysis on total number of requests per hour.
2.) Find the most popular Wikimedia project based on total views per hour per project.
3.) Find the top 10 most popular pages during a given day.
4.) Find the top 10 pages that returned the most content during a given day.
5.) Determine whether this data obeys Zipf’s law in terms of popularity.

The dataset for this project comprised of three days’ worth (January 1st, 2012 to January 3rd, 2012) of public Wikimedia log entries, which translated to about 5.6GB of compressed data and about 20GB after decompression. Each request of a page, whether for editing or reading, whether a “special page” such as a log of actions generated on the fly, or an article from Wikipedia or one of the other projects, reaches one of their squid caching hosts and the request is sent via UDP to a filter which tosses requests from the internal hosts, as well as requests for wikis that aren’t among the general projects. This filter writes out the project name, the size of the page requested, and the title of the page requested.

Here are a few sample lines from one file:

fr.b Special:Recherche/Achille_Baraguey_d%5C%27Hilliers 1 624 20120101-0000
fr.b Special:Recherche/Acteurs_et_actrices_N 1 739 20120101-0000
fr.b Special:Recherche/Agrippa_d/%27Aubign%C3%A9 1 743 20120101-0000
fr.b Special:Recherche/All_Mixed_Up 1 730 20120101-0000
fr.b Special:Recherche/Andr%C3%A9_Gazut.html 1 737 20120101-0000


In the above, the first column “fr.b” is the project name. Projects without a period and a following character are wikipedia projects. The following abbreviations are used:

  • wikibooks: “.b”
  • wiktionary: “.d”
  • wikimedia: “.m”
  • wikipedia mobile: “.mw”
  • wikinews: “.n”
  • wikiquote: “.q”
  • wikisource: “.s”
  • wikiversity: “.v”
  • mediawiki: “.w”

The second column is the title of the page retrieved, the third column is the number of requests, fourth column is the size of the content returned and the fifth column is the date and time this record was logged. There is a separate log file for each hour. These are hourly statistics, so in the line:

en Main_Page 242332 4737756101 20120101-0000


We see that the main page of the English language Wikipedia was requested over 240 thousand times and 4737756101 bytes of data were transmitted from this page between 12am and 1am on Jan 1st, 2012. These are not unique visits.

The original logs downloaded from the public Wikimedia website did not contain date and time information in a fifth column but the file name did. This data was added to each record in each of the files using a shell script. The following code was used to accomplish this:

for f in page*
do
  filedate=$(echo $f | cut -c12-24)
  awk -v filedate=$filedate ‘{print $0″ “filedate}’ $f > newfile
  mv newfile $f
done


Adding this fifth column to every record made it easier to program the Hadoop jobs but, at the same time, it increased the size of the total dataset from about 20GB to about 27GB.

Depending on the type of output required, the Hadoop mapper routines were programmed using a combination of project name, page name and date-time as key, and either number of requests or content size as the value. The Reducers iterate through and sum up all the values with a common key. Results obtained from Hadoop were sorted and truncated for presentation using the linux sort and head commands.

Click here for the Github repository of the code for this project.
Click here for the results I found after implementing the code above.
Here are the commands I used for running this code:

javac -classpath .:/usr/local/hadoop/hadoop-1.0.3/hadoop-core-1.0.3.jar WikiProjectMapper.java
javac -classpath .:/usr/local/hadoop/hadoop-1.0.3/hadoop-core-1.0.3.jar WikiProjectReducer.java
javac -classpath .:/usr/local/hadoop/hadoop-1.0.3/hadoop-core-1.0.3.jar WikiProject.java
jar -cvf WikiProject.jar *.class
hadoop jar WikiProject.jar WikiProject /user/wikilogs/page* /user/testOut
hadoop dfs -getmerge /user/testOut /home/


(2250)

cfObjective 2013 – Practical Deployment with Git and Ant

I had the privilege of speaking again at cfObjective this year, and once again it was a great experience. I talked about Git and Apache Ant and how you can get these two cool technologies to work together. My intent was to give the audience an introduction to both technologies and present ways to integrate them together. If you are new to Git, an earlier blog post I published might be helpful for getting started with Git. Also, this is a great tutorial for tips on installing and getting started with Ant: Introduction to Ant Tutorial.

A big thanks to everyone who came to my talk! I would love feedback from everyone that was there and I would be happy to answer any questions – just leave a comment here or send me an email.

Here is the slide deck and demo code. Hope you find this useful.
Slide Deck
Demo Code
Github repository for demo (1474)

Apache Ant presentation for TACFUG

I did a short 15-20 minute presentation during the TACFUG meeting yesterday on Apache Ant. I really liked this tutorial for tips on installing and getting started with Ant: Introduction to Ant Tutorial. My demo was heavily derived from this tutorial :)

Also, I would highly recommend going through the illustrious Mr. Priest’s Ant wiki for tons of useful resources.

Here is the slide deck and the demo code:
Slide Deck
Demo Code (2171)

ColdFusion script for counting lines of code in directory

Recently I needed a tool to count the total number of lines of code that I had written in all files in a particular directory. I looked around on the web but didn’t find anything that quite met my specifications. So I ended up writing a script myself to do this task. It turned out to be fairly straightforward but I have put it up on Github, hoping that it will save some time for someone else, who is looking for something similar:
https://github.com/anantunc/linesOfCode

You are welcome to use this code for any non-commercial purposes. Please contact me if you plan to use this commercially. (4141)

Getting started with Git

Here a few useful resources, instructions and tips on getting started with Git. This is meant to be more like a personal repository of Git resources, but I figured I’d put it on my blog so that it might hopefully help some folks out there on the interwebs as well.

Create free account on https://github.com/

Download git: http://git-scm.com/downloads
If you want a nicer GUI client (for windows): http://windows.github.com/

Setting up Git: https://help.github.com/articles/set-up-git

Open up Git Bash or Git Shell. Preliminary commands:

  • git config –global user.name “Your Name Here”
  • git config –global user.email “your_email@youremail.com”

Generate SSH key: https://help.github.com/articles/generating-ssh-keys
(Make sure you remember your pass phrase)

Commands for committing files: http://gitref.org/basic/

  • git status (shows you the current status of the repository)
  • git add . (adds file contents of all uncommited files to your staging area before you can commit them)
  • git status (shows you the current status of the repository)
  • git commit -m ‘[commit message]’ (commits the file with the commit message specified)

Commands for branching and merging files: http://gitref.org/branching/

  • git branch (lists all available branches)
  • git branch [branchname] (creates a new branch)
  • git checkout [branchname] (switches to that branch)
  • git branch -d [branchname] (deletes a branch)
  • git merge [branchname] (merges branch with current branch)
  • git log –oneline (shows brief commit history of a branch)

Commands for pulling/pushing files to/from remote repository: http://gitref.org/remotes/

  • Default remote-name: ‘origin’
  • git fetch [remote-name] (fetches any new work that has been pushed to that server since you cloned)
  • git pull [remote-name] [branchname] (merges remote branch with local branch)
  • git push [remote-name] [branchname] (merges local branch with remote branch)
  • Other useful information about remote branches and workflow: http://git-scm.com/book/en/Git-Branching-Remote-Branches

Other useful Git commands: http://davidwalsh.name/git-commands

Interesting article on Git workflow: http://nvie.com/posts/a-successful-git-branching-model/

REALLY COOL Git tutorial (doesn’t work in IE): http://try.github.com/levels/1/challenges/1

A fairly objective comparison of Git versus Subversion: https://git.wiki.kernel.org/index.php/GitSvnComparison

Hope this helps! (5926)

NCDevCon 2012 – Design Patterns for everyday use (again)

I spoke and volunteered once again at NCDevCon 2012 this past weekend. Got a chance to catch up with a lot of old friends, renew acquaintances and make some new friends. I had a great time and based on what I’ve heard so far, I wasn’t the only one who enjoyed it.

Once again, thanks to everyone who came to my session on Design Patterns even though it was the last one on Sunday. I hope you guys managed to get something useful out of it. Here are the slides and the code samples, for your reference. Please feel free to contact me here or via email with any questions or feedback.

Slide Deck
Code Samples (1120)

cfObjective 2012 – Design Patterns for everyday use

Greetings interweb dwellers!

I presented at cfObjective this past Saturday on Design Patterns for everyday use. This talk was intended to be an introduction to design patterns and a demonstration of a few common design patterns in a ColdFusion application. The main point that I wanted people to take away was that design patterns are very useful but they are best practices and not recipes – it is important to determine if they really solve your particular design problem before you implement them. Don’t try to fit your problem to a pattern, see if a pattern solves your problem.

This was my first time in Minneapolis, first time at cfObjective and first time speaking to a crowd as large and geographically diverse as the one that attended my presentation there (lot of firsts there, I guess). Anyhow it was a great experience and I thoroughly enjoyed it! The conference was well organized, quality of the content was superb and networking opportunities were abundant. I got to catch up with a bunch of old friends and met a lot of very smart people. Overall, a very fruitful conference experience.

If you were at my talk, thanks a lot for attending – I would love feedback from people that were there! Here are my slides and the code samples I used for my talk. Hope you find these helpful. I would be happy to answer any questions – just leave a comment here or send me an email.

Slide Deck
Code Samples
(Updated with slides from NCDevCon 2012) (1301)

NCDevCon 2011 – ColdFusion 9 and Apache Solr presentation

Greetings folks!

I did a presentation at NCDevCon back in September of last year (2011) on adding search functionality to web applications using Apache Solr with ColdFusion9. Here are my slides and the demo code I used for that presentation. I suppose this is wayyyy overdue but better late than never!

Hope you find these useful.

Slide Deck
Demo Code (1148)

© 2008-2014 Anant Pradhan | Sun Nov 23, 2014

Theme by Anders NorenUp ↑