Hadoop Serialization

Hadoop does not use the default java serialization framework for performance reasons. It has it’s own serialization format writables , which is fast and compact but not interoperable. For that hadoop allows you to specify any custom serialization framework. you can set additional frameworks by adding them to io.serializations (present in core-default.xml/core-site.xml)property.

Writable Interface provides two methods to handle serialization/de-serialization

public interface Writable {
 Serialize the fields of this object to out
 void write(DataOutput out) throws IOException;
 Deserialize the fields of this object from in
 void readFields(DataInput in) throws IOException;

All writable objects are mutable except NullWritable (java string is not mutable). Some of the important Writable types

Primitive Types:

BooleanWritable, IntWritable, LongWritable, FloatWritable, DoubleWritable, Text, NullWritable, BytesWritable

Collection types:

ArrayWritable, ArrayPrimitiveWritable, TwoDArrayWritable, MapWritable, SortedMapWritable

Text is equivalent to java string , but it doesn’t have the rich API of string, you can convert the Text object to string by calling it’s toString() method.

BytesWritable is wrapper for array of bytes, it is mutable ,value can be changed by using set() method. to get the capacity use getBytes(), to know how many bytes it contains use getLenght()

To use ArrayWritable or ArrayPrimitiveWritable you should metion the type it should contain

ArrayWritable writable = new ArrayWritable(Text.class);

or you can subclass them

public class TextArrayWritable extends ArrayWritable {
public TextArrayWritable() {

ArrayPrimitiveWritable is wrraper for arrays of java primitive, you don’t need to set the type, it will be automatically detected.

In case of MapWritable(or SortedMapWritable) you can use different writbales types in a single map instance.

Add a Comment

Your email address will not be published. Required fields are marked *