TDSM 12.3

From The Data Science Design Manual Wikia
Revision as of 22:35, 12 December 2017 by Venkatkedar (talk | contribs)
Jump to: navigation, search

1. Largest Number

map(file_id,iterator numbers){

   max=INTEGER.MIN_VALUE
   while(numbers.hasNext()):
        num=numbers.next()
        if(num>max):
           max=num
   end while
   emit('max',max)

}


reduce(key, iterator max_values){

   max=INTEGER.MIN_VALUE
   while(max_values.hasNext()):
        num=max_values.next()
        if(num>max):
           max=num
   end while
   emit('overall_max',max)   

}


We are given a list of files and each file has list of numbers. In MapReduce, each node parallelly picks a file and executes map function by passing file_num as key and list of integers in the file as iterator. map function then finds the maximum in that file. then the map function maps the maximum of that file with the key 'max' and emits to the map-reduce framework which distributes the key-value pair to the network of nodes

reduce function recieves single key 'max' and a list of maximum values. The elements of max_values are maximum of each file. Then reduce function finds the maximum among the max_values.


2.