MapReduce on AWS and Azure HDInsight

Running a series of analysis on Google n-grams dataset using MapReduce techniques

Multiple analysis on Google n-grams dataset to obtain statistical information using Apache Pig. At first, the interactive PIG shell has been used to check the procedure step-by-step on a smaller dataset through SSH, then the associated PIG script with all commands has been uploaded on the Amazon Elastic MapReduce (EMR).

In a similar project, a MapReduce program to compute some metrics of a large social media (Friendster) graph has been implemented on Microsoft Azure HDInsight in which I developed a Hadoop code in Java.


Running mapreduce codes on clouds, AWS (EMR) and Azure HDInsight