Tokenizing Java Source Code
The StreamTokenizer can be used for simple parsing of a Java source
file into tokens. The tokenizer can be aware of Java-style comments
and ignore them. It is also aware of Java quoting and escaping rules.
try {
// Create the tokenizer to read from a file
FileReader rd = new FileReader("filename.java");
StreamTokenizer st = new StreamTokenizer(rd);
// Prepare the tokenizer for Java-style tokenizing rules
st.parseNumbers();
st.wordChars('_', '_');
st.eolIsSignificant(true);
// If whitespace is not to be discarded, make this call
st.ordinaryChars(0, ' ');
// These calls caused comments to be discarded
st.slashSlashComments(true);
st.slashStarComments(true);
// Parse the file
int token = st.nextToken();
while (token != StreamTokenizer.TT_EOF) {
token = st.nextToken();
switch (token) {
case StreamTokenizer.TT_NUMBER:
// A number was found; the value is in nval
double num = st.nval;
break;
case StreamTokenizer.TT_WORD:
// A word was found; the value is in sval
String word = st.sval;
break;
case '"':
// A double-quoted string was found; sval contains the contents
String dquoteVal = st.sval;
break;
case '\'':
// A single-quoted string was found; sval contains the contents
String squoteVal = st.sval;
break;
case StreamTokenizer.TT_EOL:
// End of line character found
break;
case StreamTokenizer.TT_EOF:
// End of file has been reached
break;
default:
// A regular character was found; the value is the token itself
char ch = (char)st.ttype;
break;
}
}
rd.close();
} catch (IOException e) {
}
I believe there is a bug in this code, since you call nextToken outside the while loop and then immediately again inside the while loop, so the first token will always be discarded.
Just to be clear, the second nextToken() call should be placed as the last line inside the while loop. I think that would act as expected.