Tokenizing Java Source Code

The StreamTokenizer can be used for simple parsing of a Java source file into tokens. The tokenizer can be aware of Java-style comments and ignore them. It is also aware of Java quoting and escaping rules.
try { // Create the tokenizer to read from a file FileReader rd = new FileReader("filename.java"); StreamTokenizer st = new StreamTokenizer(rd); // Prepare the tokenizer for Java-style tokenizing rules st.parseNumbers(); st.wordChars('_', '_'); st.eolIsSignificant(true); // If whitespace is not to be discarded, make this call st.ordinaryChars(0, ' '); // These calls caused comments to be discarded st.slashSlashComments(true); st.slashStarComments(true); // Parse the file int token = st.nextToken(); while (token != StreamTokenizer.TT_EOF) { token = st.nextToken(); switch (token) { case StreamTokenizer.TT_NUMBER: // A number was found; the value is in nval double num = st.nval; break; case StreamTokenizer.TT_WORD: // A word was found; the value is in sval String word = st.sval; break; case '"': // A double-quoted string was found; sval contains the contents String dquoteVal = st.sval; break; case '\'': // A single-quoted string was found; sval contains the contents String squoteVal = st.sval; break; case StreamTokenizer.TT_EOL: // End of line character found break; case StreamTokenizer.TT_EOF: // End of file has been reached break; default: // A regular character was found; the value is the token itself char ch = (char)st.ttype; break; } } rd.close(); } catch (IOException e) { }

Comments

17 Feb 2010 - 9:14am by maaxiim (not verified)

I believe there is a bug in this code, since you call nextToken outside the while loop and then immediately again inside the while loop, so the first token will always be discarded.

17 Feb 2010 - 9:16am by maaxiim (not verified)

Just to be clear, the second nextToken() call should be placed as the last line inside the while loop. I think that would act as expected.

Post a comment

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image. Ignore spaces and be careful about upper and lower case.