The `SimpleCxxLib` package

#include "tokenscanner.h"

`class TokenScanner`

This class divides a string into individual tokens. The typical use of the TokenScanner class is illustrated by the following pattern, which reads the tokens in the string variable input:

   TokenScanner scanner(input);
   while (scanner.hasMoreTokens()) {
      string token = scanner.nextToken();
      ... process the token ...
   }

The TokenScanner class exports several additional methods that give clients more control over its behavior. Those methods are described individually in the documentation.

Constructor
TokenScanner() TokenScanner(str) TokenScanner(infile)	Initializes a scanner object.
Methods
addOperator(op)	Defines a new multicharacter operator.
addWordCharacters(str)	Adds the characters in `str` to the set of characters legal in a `WORD` token.
getChar()	Reads the next character from the scanner input stream.
getPosition()	Returns the current position of the scanner in the input stream.
getStringValue(token)	Returns the string value of a token.
getTokenType(token)	Returns the type of this token.
hasMoreTokens()	Returns `true` if there are additional tokens for this scanner to read.
ignoreComments()	Tells the scanner to ignore comments.
ignoreWhitespace()	Tells the scanner to ignore whitespace characters.
isWordCharacter(ch)	Returns `true` if the character is valid in a word.
nextToken()	Returns the next token from this scanner.
saveToken(token)	Pushes the specified token back into this scanner's input stream.
scanNumbers()	Controls how the scanner treats tokens that begin with a digit.
scanStrings()	Controls how the scanner treats tokens enclosed in quotation marks.
setInput(str) setInput(infile)	Sets the token stream for this scanner to the specified string or input stream.
ungetChar(ch)	Pushes the character `ch` back into the scanner stream.
verifyToken(expected)	Reads the next token and makes sure it matches the string `expected`.

Constructor detail

TokenScanner();
TokenScanner(string str);
TokenScanner(istream & infile);

Initializes a scanner object. The initial token stream comes from the specified string or input stream, if supplied. The default constructor creates a scanner with an empty token stream.

Usage:

TokenScanner scanner;
TokenScanner scanner(str);
TokenScanner scanner(infile);

Method detail

void setInput(string str);
void setInput(istream & infile);

Sets the token stream for this scanner to the specified string or input stream. Any previous token stream is discarded.

Usage:

scanner.setInput(str);
scanner.setInput(infile);

bool hasMoreTokens();

Returns true if there are additional tokens for this scanner to read.

Usage:

if (scanner.hasMoreTokens()) ...

string nextToken();

Returns the next token from this scanner. If nextToken is called when no tokens are available, it returns the empty string.

Usage:

token = scanner.nextToken();

void saveToken(string token);

Pushes the specified token back into this scanner's input stream. On the next call to nextToken, the scanner will return the saved token without reading any additional characters from the token stream.

Usage:

scanner.saveToken(token);

int getPosition() const;

Returns the current position of the scanner in the input stream. If saveToken has been called, this position corresponds to the beginning of the saved token. If saveToken is called more than once, getPosition returns -1.

Usage:

int pos = scanner.getPosition();

void ignoreWhitespace();

Tells the scanner to ignore whitespace characters. By default, the nextToken method treats whitespace characters (typically spaces and tabs) just like any other punctuation mark and returns them as single-character tokens. Calling

   scanner.ignoreWhitespace();

changes this behavior so that the scanner ignores whitespace characters.

Usage:

scanner.ignoreWhitespace();

void ignoreComments();

Tells the scanner to ignore comments. The scanner package recognizes both the slash-star and slash-slash comment format from the C-based family of languages. Calling

   scanner.ignoreComments();

sets the parser to ignore comments.

Usage:

scanner.ignoreComments();

void scanNumbers();

Controls how the scanner treats tokens that begin with a digit. By default, the nextToken method treats numbers and letters identically and therefore does not provide any special processing for numbers. Calling

   scanner.scanNumbers();

changes this behavior so that nextToken returns the longest substring that can be interpreted as a real number.

Usage:

scanner.scanNumbers();

void scanStrings();

Controls how the scanner treats tokens enclosed in quotation marks. By default, quotation marks (either single or double) are treated just like any other punctuation character. Calling

   scanner.scanStrings();

changes this assumption so that nextToken returns a single token consisting of all characters through the matching quotation mark. The quotation marks are returned as part of the scanned token so that clients can differentiate strings from other token types.

Usage:

scanner.scanStrings();

void addWordCharacters(string str);

Adds the characters in str to the set of characters legal in a WORD token. For example, calling addWordCharacters("_") adds the underscore to the set of characters that are accepted as part of a word.

Usage:

scanner.addWordCharacters(str);

bool isWordCharacter(char ch) const;

Returns true if the character is valid in a word.

Usage:

if (scanner.isWordCharacter(ch)) ...

void addOperator(string op);

Defines a new multicharacter operator. Whenever you call nextToken when the input stream contains operator characters, the scanner returns the longest possible operator string that can be read at that point.

Usage:

scanner.addOperator(op);

void verifyToken(string expected);

Reads the next token and makes sure it matches the string expected. If it does not, verifyToken throws an error.

Usage:

scanner.verifyToken(expected);

TokenType getTokenType(string token) const;

Returns the type of this token. This type will match one of the following enumerated type constants: EOF, SEPARATOR, WORD, NUMBER, STRING, or OPERATOR.

Usage:

TokenType type = scanner.getTokenType(token);

int getChar();

Reads the next character from the scanner input stream.

Usage:

int ch = scanner.getChar();

void ungetChar(int ch);

Pushes the character ch back into the scanner stream. The character must match the one that was read.

Usage:

scanner.ungetChar(ch);

string getStringValue(string token) const;

Returns the string value of a token. This value is formed by removing any surrounding quotation marks and replacing escape sequences by the appropriate characters.

Usage:

string str = scanner.getStringValue(token);

The SimpleCxxLib package

class TokenScanner

Constructor detail

Method detail

The `SimpleCxxLib` package

`class TokenScanner`