Hadoop – Text file to Sequence File

Sankar Cheppali | August 1, 2016 | Hadoop, Java | No Comments

Hadoop Supports many input/output file formats. In this blog we will see how to read a text file and we will save the results in Sequence file format.

The file is CSV with the following fields

user_id,song_id,listen_count,title,artist,song

Our requirement is to group the songs by an artist. So the resulted file will have a single record for each artist, with the all the songs sang by him.

Map :

First we need to ignore the record with the offset 0 (key to the map) as that will contain the name of the field.


if(key.get()==0){
return;
}

then split the record by , and write field 4 as key (artist) , field 5 as value (song)


String[] tokens=value.toString().split(&quot;,&quot;);
context.write(new Text(tokens[4]), new Text(tokens[5]));

Reducer :

On reducer side we just need we just need convert the all songs into a writable


ArrayList&amp;lt;Text&amp;gt; vs=new ArrayList&amp;lt;Text&amp;gt;();
for(Text t:values){
vs.add(t);
}
context.write(key, new Text(vs.toString()));

Now the reducer output will be saved in Sequence file format. In the reducer you can use better approach to save the songs collection. Instead of adding it to a ArrayList and again converting it to a Text we can use ArrayWritable

Full project is available here

iCircuit

Hadoop – Text file to Sequence File

Map :

Reducer :

About The Author

Sankar Cheppali

Add a Comment

Map :

Reducer :

Related Posts

About The Author

Sankar Cheppali

Add a Comment