Analysis of Wikimedia Logs for Traffic Load and Popularity using Apache Hadoop

I took a class in Data Center systems this past Spring as a part of my Masters curriculum. My final project for that class was a project on analyzing public Wikimedia logs to determine different traffic load as well as popularity patterns using Apache Hadoop. I had a lot of fun doing this project and it was great learning experience so I figured I'd blog about it. The goals of this project were: 1.) Perform temporal analysis on total number of requests per hour. 2.) Find the most popular Wikimedia project based on total views per hour per project. 3.) Find the top 10 most popular pages during a given…
Read More

cfObjective 2013 – Practical Deployment with Git and Ant

I had the privilege of speaking again at cfObjective this year, and once again it was a great experience. I talked about Git and Apache Ant and how you can get these two cool technologies to work together. My intent was to give the audience an introduction to both technologies and present ways to integrate them together. If you are new to Git, an earlier blog post I published might be helpful for getting started with Git. Also, this is a great tutorial for tips on installing and getting started with Ant: Introduction to Ant Tutorial. A big thanks to everyone who came to my talk! I would love feedback…
Read More

Analysis of Wikimedia Logs for Traffic Load and Popularity using Apache Hadoop

I took a class in Data Center systems this past Spring as a part of my Masters curriculum. My final project for that class was a project on analyzing public Wikimedia logs to determine different traffic load as well as popularity patterns using Apache Hadoop. I had a lot of fun doing this project and it was great learning experience so I figured I'd blog about it. The goals of this project were: 1.) Perform temporal analysis on total number of requests per hour. 2.) Find the most popular Wikimedia project based on total views per hour per project. 3.) Find the top 10 most popular pages during a given…
Read More

cfObjective 2013 – Practical Deployment with Git and Ant

I had the privilege of speaking again at cfObjective this year, and once again it was a great experience. I talked about Git and Apache Ant and how you can get these two cool technologies to work together. My intent was to give the audience an introduction to both technologies and present ways to integrate them together. If you are new to Git, an earlier blog post I published might be helpful for getting started with Git. Also, this is a great tutorial for tips on installing and getting started with Ant: Introduction to Ant Tutorial. A big thanks to everyone who came to my talk! I would love feedback…
Read More