Hadoop setPartitionerClass & setGroupingComparatorClass

- setPartitionerClass & setGroupingComparatorClass

* setPartitionerClass

파티셔너는 맵 태스크의 출력 데이터를 리듀스 태스크의 입력 데이터로 보낼지 결정하고,

이렇게 파티셔닝된 데이터는 맴 태스크의 출력 데이터의 키의 값에 따라 정렬된다.

MapReducer에서 사용하는 파티셔너는 반드시 org.apache.hadoop.mapreduce.Partitioner를

상속받아서 구현해야 한다. 이때 파티셔너 설정하는 두 개의 패러미터는 Mapper의

출력 데이터 키와 값에 해당하는 패러미터이다.

예)

import org.apache.hadoop.mapreduce.Partitioner;

public class RecPartitioner<KEY, VALUE> extends Partitioner<KEY, VALUE> {

@Override

public int getPartition(KEY key, VALUE value, int numPartitions) {

String strKey = key.toString();

return (strKey.hashCode() & Integer.MAX_VALUE) % numPartitions;

}

* setGroupingComparatorClass

Groupkey Comparator를 사용해서 같은 Groupkey에 해당하는 모든 데이터를 하나의 Reducer

그룹에서 처리할 수 있다.

예)

import org.apache.hadoop.io.Text;

import org.apache.hadoop.io.WritableComparable;

import org.apache.hadoop.io.WritableComparator;

public class RecGroupingComparator extends WritableComparator {

protected RecGroupingComparator() {

super(Text.class, true);

// TODO Auto-generated constructor stub

}

public int compare(WritableComparable w1, WritableComparable w2) {

Text t1 = (Text) w1;

Text t2 = (Text) w2;

String t1Key = t1.toString();

String t2Key = t2.toString();

return t1key.compareTo(t2key);

}

개발개발개발