TDSM 12.3
1. Largest Number
map(file_id,iterator numbers){
max=INTEGER.MIN_VALUE while(numbers.hasNext()): num=numbers.next() if(num>max): max=num end while emit('max',max)
}
reduce(key, iterator max_values){
max=INTEGER.MIN_VALUE while(max_values.hasNext()): num=max_values.next() if(num>max): max=num end while emit('overall_max',max)
}
We are given a list of files and each file has list of numbers. In MapReduce, each node parallelly picks a file and executes map function by passing file_num as key and list of integers in the file as iterator. map function then finds the maximum in that file. then the map function maps the maximum of that file with the key 'max' and emits to the map-reduce framework
which distributes the key-value pair to the network of nodes
reduce function recieves single key 'max' and a list of maximum values. The elements of max_values are maximum of each file. Then reduce function finds the maximum among the max_values.
2. Average :
map(file_id,iterator numbers){
sum=0 count=0 while(numbers.hasNext()): num=numbers.next() sum+=num count+=1 end while emit('avg',(sum,count))
}
- here the output is a tuple of sum and count as (sum,count)
reduce(key, iterator sum_count_tuples){
sum=0 count=0 while(sum_count_tuples.hasNext()): sum_i,count_i=sum_count_tuples.next() sum=sum+sum_i count=count+count_i end while emit('overall_avg',(sum/count))
}
3. Distinct:
map(file_id,iterator numbers){
while(numbers.hasNext()): num=numbers.next() emit(num,1) end while
}
- in map-reduce , emit is not return operation , it emits the key,value to network, so it can be inside loop
- here number is the key
reduce(uniq_num, iterator values){
emit(uniq_num,1)
}
- here the values is list of 1s as [1,1,1,...1] for each unique number. so the emit in reduce produces 1 for each unique number. uniq_num is key