Perl: Processing a CSV or Tab-Delimited File into a Hash of Hashes
Perl is a great language when it comes to processing data. The biggest reason is because its powerful regular expression support makes it easy to find the data you’re looking for.
When you have a file that is in CSV (comma-delimited), tab-delimited, or some other delimited format, Perl is the perfect option for reading in the file and doing something with its data. In this example we will store it into a hash of hashes.
Let’s jump right into the code.
First, we initialize an empty hash where we’ll store the data, then open the file for reading:
%file_data = (); unless (open (IN, "tab_file.txt")) { print "ERROR: Could not open input file tab_file.txt: $!\n"; exit; }
Now we will read in each line in the file:
while ($line = <IN>) {
Get rid of the ending newline:
chomp($line);
Here’s the key to getting the data on the line. We’re splitting the line (here on the tab character, but it can be any character) into parts and storing those parts into an array:
@line_info = split(/\t/, $line);
Let’s say the key field to the line’s information is in the first field, so we store that temporarily:
$key_fld = $line_info[0];
Now we’re going to iterate through the tab-delimited fields from the line and store them into a hash of hashes for easy retrieval later on. The first line here is converting the array into a reference. The second line is then iterating through that array reference. The line inside the loop stores each field into a hash of hashes: the first key field is the first field on the line, which we defined above, and the second key is just the number of the field. (This might not be the most practical way to store the data, but you get the idea.)
$line_info = \@line_info; for ($i = 0; $i <= $#$line_info; $i++) { $file_data{$key_fld}{$i} = $line_info->[$i]; } }
We close our input file, since we’re done reading it:
close(IN);
Now if we want to iterate through the hash of hashes we created with the data, we can do something like this:
foreach $key (keys %file_data) { print "Key field: $key\n"; foreach $key2 (keys %{$file_data{$key}}) { print "- $key2 = $file_data{$key}{$key2}\n"; } }
And that’s it–we’re done!