Hadoop Serialization

Hadoop does not use the default java serialization framework for performance reasons. It has it’s own serialization format writables , which is fast and compact but not interoperable. For that hadoop allows you to specify any custom serialization framework. you can set additional frameworks by adding them to io.serializations (present in core-default.xml/core-site.xml)property.

Writable Interface provides two methods to handle serialization/de-serialization

public interface Writable {
 /*
 Serialize the fields of this object to out
 */
 void write(DataOutput out) throws IOException;
 /*
 Deserialize the fields of this object from in
 */
 void readFields(DataInput in) throws IOException;
}

All writable objects are mutable except NullWritable (java string is not mutable). Some of the important Writable types

Primitive Types:

BooleanWritable, IntWritable, LongWritable, FloatWritable, DoubleWritable, Text, NullWritable, BytesWritable

Collection types:

ArrayWritable, ArrayPrimitiveWritable, TwoDArrayWritable, MapWritable, SortedMapWritable

Text is equivalent to java string , but it doesn’t have the rich API of string, you can convert the Text object to string by calling it’s toString() method.

BytesWritable is wrapper for array of bytes, it is mutable ,value can be changed by using set() method. to get the capacity use getBytes(), to know how many bytes it contains use getLenght()

To use ArrayWritable or ArrayPrimitiveWritable you should metion the type it should contain


ArrayWritable writable = new ArrayWritable(Text.class);

or you can subclass them


public class TextArrayWritable extends ArrayWritable {
public TextArrayWritable() {
super(Text.class);
}
}

ArrayPrimitiveWritable is wrraper for arrays of java primitive, you don’t need to set the type, it will be automatically detected.

In case of MapWritable(or SortedMapWritable) you can use different writbales types in a single map instance.

Add a Comment

Your email address will not be published. Required fields are marked *