Hadoop- Sequence Files

In this blog we will how to read a Sequence file and convert a text file to a sequence file using a java program.

Sequence Files :

Data in sequence files is saved as key value pairs. Supports compression out-off the box (Record level and Block level compression is supported)(Sequencefile reader will automatically detect the compression used and decompress the data)

Writing Sequence Files :

First we will how to write a Sequence file without any compression. The following accepts two arguments (source file and destination file).


yarn jar WriteSequenceFile-0.0.1-SNAPSHOT.jar hdfs:///user/anshumanthesniper7722/student_marks.txt hdfs:///user/anshumanthesniper7722/student_marks.seq

The source file is key value file with TAB as separator between key and value.

key value1,value2,value3

key value1,value2,value3

In sequence file you can save the values in the same comma separated way as in source file, or use Arraywritable. You need extend the ArrayWritable.

public class TextArrayWritable extends ArrayWritable{

public TextArrayWritable() {
super(Text.class);
}

public String toString(){
String str="";
for(Writable text:this.get()){
str+=text.toString()+",";
}
return str;
}

}

Now you can simply read the lines from source file and append them to the destination sequence file.

public static void main( String[] args ) throws Exception
{
String srcUri = args[0];
String destUri=args[1];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Text key=new Text();
TextArrayWritable value=new TextArrayWritable();
SequenceFile.Writer seqWriter = null;
InputStream is=fs.open(new Path(srcUri));

BufferedReader br=new BufferedReader(new InputStreamReader(is,"UTF-8"));
try {
seqWriter = new SequenceFile.Writer(fs, conf, new Path(destUri),
key.getClass(),value.getClass());
//read from br and split the string,then write to seqWriter
String line;
while(( line=br.readLine())!=null){
key.set(line.split("\t")[0]);
String[] values=line.split("\t")[1].split(",");
Writable[] witable=new Text[values.length];
for(int i=0;i<values.length;i++){
witable[i]=new Text(values[i]);
}
value.set(witable);
seqWriter.append(key, value);
}
} finally {
IOUtils.closeStream(seqWriter);
}
}
Reading Sequence File:

Now we will see how to read sequence file, (the Writable classed we haved used while writing the sequence file, should be present on the classpath). To read the sequence file create a reader on Sequence File and call the next() method by passing variables of your key/value class.

public static void main( String[] args ) throws IllegalArgumentException, IOException
{
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get( conf);
SequenceFile.Reader reader = null;
try {
reader = new SequenceFile.Reader(fs, new Path(uri), conf);
Writable key = (Writable)
ReflectionUtils.newInstance(reader.getKeyClass(), conf);
Writable value = (Writable)
ReflectionUtils.newInstance(reader.getValueClass(), conf);
while (reader.next(key, value)) {
System.out.println("Key:"+key+",Value:"+value);
}
} finally {
IOUtils.closeStream(reader);
}
}

Hadoop_read_sequence_file

Add a Comment

Your email address will not be published. Required fields are marked *