com.ccg.io
Class TextTable

java.lang.Object
  extended by com.ccg.io.TextTable

public class TextTable
extends Object

Class used to parse ASCII tables Many applications (spreadsheets in particular) are capable of exporting data to a ASCII file in a series of rows and columns, where each row corresponds to one record in the table, and the columns correspond to record fields. A common form of this export method is know as tab separated values. This class is helpful in parsing the contents of these types of files and can deal with leading headers.

This class really deserves a overview document - as time permits I will get back to it.

Since:
1.0
Version:
$Revision: 1.1.1.1 $
Author:
$Author: pkb $

Field Summary
static int FROM_FIRST_ROW
          Constant used during construction to calculate column count.
 
Constructor Summary
TextTable()
          Default constructor This method initializes the object such that it expects to parse data from from a TAB separated ASCII text file.
TextTable(int min_cols, int max_cols, boolean colHeads, String sep)
          Specify table constraints when constructing the object.
 
Method Summary
 int getFieldCount()
          The number of columns which were last parsed.
 String getFieldString(int pos, String def)
          Retrieve the data from a particular column.
 int getMaxColumns()
          Get the maximum number of columns each row in the table must have.
 int getMinColumns()
          Get the minimum number of columns each row in the table must have.
 int getTableColumn(String name)
          Determine the column index for one of the "columns" of data.
static int[] getTableMap(TextTable tt, String[][] headers)
          Determine the indexes of the columns we are interested in by column heading.
 boolean isNextRowColHeads()
          Is should the next row parsed be treated as column headings set?
static void main(String[] args)
          Main entry point into the application.
 boolean parseRow(String raw)
          Parse the fields from a single row.
 void setNextRowColHeads(boolean val)
          Set should the next row parsed be treated as column headings.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

FROM_FIRST_ROW

public static int FROM_FIRST_ROW
Constant used during construction to calculate column count. This constant is typically used during the construction of the tables when you want to specify that the minimum or maximum number of columns in the table should be computed from the first row read in.

Since:
1.0
Constructor Detail

TextTable

public TextTable()
Default constructor This method initializes the object such that it expects to parse data from from a TAB separated ASCII text file. This means that tab characters are expected to separate the fields on each line. In addition, the min/max number of columns will both be automatically set based on the number of columns parsed from the first line in the file. It is also assumed that the first line of the file will contain column headings.

Since:
1.0
See Also:
TextTable(int,int,boolean,String)

TextTable

public TextTable(int min_cols,
                 int max_cols,
                 boolean colHeads,
                 String sep)
Specify table constraints when constructing the object. This constructor allows you to specify various aspects of the ASCII data file which you intend to parse.

Parameters:
minCols - The minimum number of columns which must be present in each valid data row. You can use the constant FROM_FIRST_ROW if you want it to be automatically determined from the first row parsed.
maxCols - The maximum number of columns which must be present in each valid data row. You can use the constant FROM_FIRST_ROW if you want it to be automatically determined from the first row parsed.
colHeads - Set this to true if the first line of the tab separated file contains column headings.
sep - Pass the string which separates each column in the table. For a tab separated value file (.tsv), you should pass "\t" - which is the tab character.
Since:
1.0
See Also:
parseRow(java.lang.String)
Method Detail

parseRow

public boolean parseRow(String raw)
Parse the fields from a single row. This method parses the fields contained in single row from the tab separated file. If this method returns true, you can use the getFieldCount() to determine the number of fields parsed and the getFieldString(int, java.lang.String) to retrieve the contents of a particular field (column).

Note, if this is the first time this method is called and you indicated that the table contained column headings, then this method will return false. Your application needs to be aware of this when it processes the data.

Parameters:
row - A single row (typically read from a tab separated file).
Returns:
true if we parsed data (false if this was the column heading line, or the row did not contain a valid set of data - see above for how to tell the difference).
Since:
1.0
See Also:
getFieldCount()

getFieldCount

public int getFieldCount()
The number of columns which were last parsed. Each time the parseRow(java.lang.String) method is invoked, it will set the number of fields which it parsed from the row. It will return 0 if the last invocation of parseRow(java.lang.String) didn't parse out ANY data.

Returns:
Number of columns last parsed.
Since:
1.0
See Also:
parseRow(java.lang.String), getFieldString(int, java.lang.String)

getFieldString

public String getFieldString(int pos,
                             String def)
Retrieve the data from a particular column. This method allows one to fetch the data from a specific column of the last row parsed. The following code fragment demonstrates how one could display all of the columns just parsed:

static void foo(TextTable tt, String row) {
  if (tt.parseRow(row)) {
    int len = tt.getFieldCount();
    for (int i = 0; i < len; i++) {
      String val = tt.getFieldString(i,null);
      if (val != null) {
        System.out.println("col["+i+"]="+val);
      }
    }
  }
}
 

Parameters:
column - Which column you want to get the data from.
defaultValue - The default value you would like returned if the data wasn't present in the row.
Returns:
The value of the data in the specified column, or your default value if not present.
Since:
1.0
See Also:
parseRow(java.lang.String)

main

public static void main(String[] args)
Main entry point into the application. This is a example command line application which allows one to read ASCII text table in one form and output in another. You can do things like reduce the number of columns put out or change the order in which columns are put out. You can also change the character used to separate the columns. You run it in the following manner:
 java com.ccg.io.TextTable [-tossHead] [-minCols N] [-maxCols N]
     [-inSep STR] [-outSep STR] [-colMap C0[,C1[...CN]]]
 

The recognized command line arguments are

-tossHead
Don't print the column headers in the output.
-minCols N
The minimum number of columns which each row must have in order to be accepted. Defaults to column count of first row.
-maxCols N
The maximum number of columns which each row must have in order to be accepted. Defaults to column count of first row.
-inSep STR
The character(s) which separate each column in the input table (default is TAB character).
-outSep STR
The character(s) which separate each column in the generated output table (default is TAB character).
-colMap [C0[,C1[...CN]]
This is a interesting option. It allows you to choose what columns should appear in the generated output. For example, if you specified "-colMap 5,3,1", then only columns 5, 3, and 1 will appear in the output in the order specified. Hence you'll end up with a 3 column output file with the data in a different order than originally entered. It should be noted that the first column is column 0 (ie "Java array like" - 0,1,2,...).

Parameters:
args - Array of command line arguments.

getMinColumns

public int getMinColumns()
Get the minimum number of columns each row in the table must have.

Returns:
Minimum number of columns each row in table must have.
Since:
1.0

getMaxColumns

public int getMaxColumns()
Get the maximum number of columns each row in the table must have.

Returns:
Maximum number of columns each row in table must have.
Since:
1.0

getTableMap

public static int[] getTableMap(TextTable tt,
                                String[][] headers)
Determine the indexes of the columns we are interested in by column heading.

Assume we read the header row of the following table into a TextTable object named 'tt':

 Name   Description
 Paul   Dad
 Megan  Mom
 Erik   Son
 Scott  Son
 

We'll now try to figure out which columns contain what fields. The following array of string arrays has three entries. The first entry is the list of names that we will allow to match the last name entry, the second entry is the list of names that we will allow to match the first name entry and the last list is the list of identifiers for the description column we allow.

 String[][] mHeaders = {
   { "LastName", "last" },
   { "FirstName", "first", "Name" },
   { "Description", "text" }
 };
 

When we invoke TextTable.getTableMap(tt,mHeaders), it will return a int[] telling us which column each of the headers in our map was found in. In this case it will return { -1, 0, 1 }, indicating the the last name column wasn't found (-1), the first name column was found at column 0 of our source and the description column was found at column 1 in our source. We can pass these values to getFieldString(int, java.lang.String) as we process each line of the file.

Parameters:
tt - The TextTable object which you just parsed the header row from (the first row of a tab separated file).
headers - The headers of the columns you are interested in as described above.
Returns:
An array of index values which can be used to later retrieve the proper data via the getFieldString(int, java.lang.String) method when you are processing the data.
Since:
1.0

getTableColumn

public int getTableColumn(String name)
Determine the column index for one of the "columns" of data. This method searches the column headings (if available) to find a column heading which exactly matches the string you passed. If we can't find a match, then -1 will be returned.

Parameters:
header - The heading at the top of the column (exact match required)
Returns:
-1 if not found or unable to determine, otherwise the column index used to fetch data with
Since:
1.0
See Also:
getTableMap(com.ccg.io.TextTable, java.lang.String[][])

setNextRowColHeads

public void setNextRowColHeads(boolean val)
Set should the next row parsed be treated as column headings.

Parameters:
val - New boolean value to assign. see #getNextRowColHeads

isNextRowColHeads

public boolean isNextRowColHeads()
Is should the next row parsed be treated as column headings set?

Returns:
Current boolean value assigned. see #setNextRowColHeads


Copyright 1998-1998-2006 null. All Rights Reserved.