Сумма четных и нечетных чисел в MapReduce с использованием Cloudera Distribution Hadoop (CDH)

Опубликовано: 18 Февраля, 2022

Предварительные требования: Hadoop и MapReduce

Подсчитать количество четных и нечетных и найти их сумму на любом языке - это несложно, как в C, C ++, Python, Java и т. Д. MapReduce также использует Java для написания программы, но это очень просто, если вы знаете синтаксис как это написать. Это основа MapReduce. Сначала вы узнаете, как выполнять этот код аналогично программе «Hello World» на других языках программирования. Итак, вот шаги, которые показывают, как написать код MapReduce для подсчета и суммы четных и нечетных чисел.

Пример:

Вход:

1 2 3 4 5 6 7 8 9 

Выход:

Четные 20 // сумма четных чисел
Четные 4 // количество четных чисел 
Нечетный 25 // сумма нечетных чисел
Нечетный 5 // количество нечетных чисел

Шаги:

  • First Open Eclipse -> then select File -> New -> Java Project ->Name it EvenOdd -> then Finish.

  • Create Three Java Classes into the project. Name them EODriver(having the main function), EOMapper, EOReducer.
  • You have to include two Reference Libraries for that:

    Right Click on Project -> then select Build Path-> Click on Configue Build Path

    In the above figure, you can see the Add External JARs option on the Right Hand Side. Click on it and add the below mention files. You can find these files in /usr/lib/

    1. /usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.6.0-mr1-cdh5.13.0.jar
    2. /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.13.0.jar

Mapper Code: You have to copy paste this program into the EOMapper Java Class file.

// Importing libraries
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
  
public class EOMapper extends MapReduceBase implements Mapper<LongWritable,
                                                 Text, Text, IntWritable> {
  
    @Override
    // Map function
    public void map(LongWritable key, Text value, OutputCollector<Text, 
                                     IntWritable> output, Reporter rep)
  
    throws IOException
    {
        // Splitting the line into spaces
        String data[] = value.toString().split(" ");
  
        for (String num : data) 
        {
  
            int number = Integer.parseInt(num);
  
            if (number % 2 == 1
            {
                // For Odd Numbers
                output.collect(new Text("ODD"), new IntWritable(number));
            }
  
            else 
            {
                // For Even Numbers
                output.collect(new Text("EVEN"), 
                       new IntWritable(number));
            }
        }
    }
}

Reducer Code: You have to copy paste this program into the EOReducer Java Class file.

// Importing libraries
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
  
public class EOReducer extends MapReduceBase implements Reducer<Text,
                                   IntWritable, Text, IntWritable> {
  
    @Override
    // Reduce Function
    public void reduce(Text key, Iterator<IntWritable> value,
     OutputCollector<Text, IntWritable> output, Reporter rep)
  
    throws IOException
    {
  
        // For finding sum and count of even and odd
        // you don"t have to take different variables
        int sum = 0, count = 0;
        if (key.equals("ODD")) 
        {
            while (value.hasNext())
            {
                IntWritable i = value.next();
  
                // Finding sum and count of ODD Numbers
                sum += i.get();
                count++;
            }
        }
  
        else 
        {
            while (value.hasNext()) 
            {
                IntWritable i = value.next();
  
                // Finding sum and count of EVEN Numbers
                sum += i.get();
                count++;
            }
        }
  
        // First sum then count is printed
        output.collect(key, new IntWritable(sum));
        output.collect(key, new IntWritable(count));
    }
}

Driver Code: You have to copy paste this program into the EODriver Java Class file.

// Importing libraries
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
  
public class EODriver extends Configured implements Tool {
  
    @Override
    public int run(String[] args) throws Exception
    {
        if (args.length < 2
        {
            System.out.println("Please enter valid arguments");
            return -1;
        }
  
        JobConf conf = new JobConf(EODriver.class);
        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));
        conf.setMapperClass(EOMapper.class);
        conf.setReducerClass(EOReducer.class);
        conf.setMapOutputKeyClass(Text.class);
        conf.setMapOutputValueClass(IntWritable.class);
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);
  
        JobClient.runJob(conf);
        return 0;
    }
  
    // Main Method
    public static void main(String args[]) throws Exception
    {
        int exitcode = ToolRunner.run(new EODriver(), args);
        System.out.println(exitcode);
    }
}
  • Now you have to make a jar file. Right Click on Project-> Click on Export-> Select export destination as Jar File-> Name the jar File(EvenOdd.jar) -> Click on next -> at last Click on Finish. Now copy this file into the Workspace directory of Cloudera

  • Open the terminal on CDH and change the directory to the workspace. You can do this by using “cd workspace/” command. Now, Create a text file(EOFile.txt) and move it to HDFS. For that open terminal and write this code(remember you should be in the same directory as jar file you have created just now).

    Now, run this command to copy the file input file into the HDFS.

    hadoop fs -put EOFile.txt EOFile.txt

  • Now to run the jar file by using following syntax: “hadoop jar JarFilename DriverClassName TextFileName OutPutFolderName”

  • After Executing the code, you can see the result in EOOutput file or by writing following command on terminal.
    hadoop fs -cat EOOutput/part-00000