Class WildcardStringParser

java.lang.Object
com.twelvemonkeys.util.regex.WildcardStringParser

@Deprecated public class WildcardStringParser extends Object
Deprecated.
Will probably be removed in the near future
This class parses arbitrary strings against a wildcard string mask provided. The wildcard characters are '*' and '?'.

The string masks provided are treated as case sensitive.
Null-valued string masks as well as null valued strings to be parsed, will lead to rejection.

This class is custom designed for wildcard string parsing and is several times faster than the implementation based on the Jakarta Regexp package.


This task is performed based on regular expression techniques. The possibilities of string generation with the well-known wildcard characters stated above, represent a subset of the possibilities of string generation with regular expressions.
The '*' corresponds to ([Union of all characters in the alphabet])*
The '?' corresponds to ([Union of all characters in the alphabet])
      These expressions are not suited for textual representation at all, I must say. Is there any math tags included in HTML?

The complete meta-language for regular expressions are much larger. This fact makes it fairly straightforward to build data structures for parsing because the amount of rules of building these structures are quite limited, as stated below.

To bring this over to mathematical terms: The parser ia a nondeterministic finite automaton (latin) representing the grammar which is stated by the string mask. The language accepted by this automaton is the set of all strings accepted by this automaton.
The formal automaton quintuple consists of:

  1. A finite set of states, depending on the wildcard string mask. For each character in the mask a state representing that character is created. The number of states therefore coincides with the length of the mask.
  2. An alphabet consisting of all legal filename characters - included the two wildcard characters '*' and '?'. This alphabet is hard-coded in this class. It contains {a .. �}, {A .. �}, {0 .. 9}, {.}, {_}, {-}, {*} and {?}.
  3. A finite set of initial states, here only consisting of the state corresponding to the first character in the mask.
  4. A finite set of final states, here only consisting of the state corresponding to the last character in the mask.
  5. A transition relation that is a finite set of transitions satisfying some formal rules.
    This implementation on the other hand, only uses ad-hoc rules which start with an initial setup of the states as a sequence according to the string mask.
    Additionally, the following rules completes the building of the automaton:
    1. If the next state represents the same character as the next character in the string to test - go to this next state.
    2. If the next state represents '*' - go to this next state.
    3. If the next state represents '?' - go to this next state.
    4. If a '*' is followed by one or more '?', the last of these '?' state counts as a '*' state. Some extra checks regarding the number of characters read must be imposed if this is the case...
    5. If the next character in the string to test does not coincide with the next state - go to the last state representing '*'. If there are none - rejection.
    6. If there are no subsequent state (final state) and the state represents '*' - acceptance.
    7. If there are no subsequent state (final state) and the end of the string to test is reached - acceptance.

    Disclaimer: This class does not build a finite automaton according to formal mathematical rules. The proper way of implementation should be finding the complete set of transition relations, decomposing these into rules accepted by a deterministic finite automaton and finally build this automaton to be used for string parsing. Instead, this class is ad-hoc implemented based on the informal transition rules stated above. Therefore the correctness cannot be guaranteed before extensive testing has been imposed on this class... anyway, I think I have succeeded. Parsing faults must be reported to the author.

Examples of usage:
This example will return "Accepted!".

 WildcardStringParser parser = new WildcardStringParser("*_28????.jp*");
 if (parser.parseString("gupu_280915.jpg")) {
     System.out.println("Accepted!");
 } else {
     System.out.println("Not accepted!");
 }
 

Theories and concepts are based on the book Elements of the Theory of Computation, by Harry l. Lewis and Christos H. Papadimitriou, (c) 1981 by Prentice Hall.

Author:
Eirik Torske
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final char[]
    Deprecated.
    Field ALPHABET
    static final char
    Deprecated.
    Field FREE_PASS_CHARACTER
    static final char
    Deprecated.
    Field FREE_RANGE_CHARACTER
  • Constructor Summary

    Constructors
    Constructor
    Description
    Deprecated.
    Creates a wildcard string parser.
    WildcardStringParser(String pStringMask, boolean pDebugging)
    Deprecated.
    Creates a wildcard string parser.
    WildcardStringParser(String pStringMask, boolean pDebugging, PrintStream pDebuggingPrintStream)
    Deprecated.
    Creates a wildcard string parser.
  • Method Summary

    Modifier and Type
    Method
    Description
    protected Object
    Deprecated.
     
    boolean
    equals(Object pObject)
    Deprecated.
    Method equals
    protected void
    Deprecated.
     
    Deprecated.
    Gets the string mask that was used when building the parser atomaton.
    int
    Deprecated.
    Method hashCode
    static boolean
    isFreePassCharacter(char pCharToCheck)
    Deprecated.
    Tests if a certain character is the designated "free-pass" character ('?').
    static boolean
    isFreeRangeCharacter(char pCharToCheck)
    Deprecated.
    Tests if a certain character is the designated "free-range" character ('*').
    static boolean
    isInAlphabet(char pCharToCheck)
    Deprecated.
    Tests if a certain character is a valid character in the alphabet that is applying for this automaton.
    static boolean
    isWildcardCharacter(char pCharToCheck)
    Deprecated.
    Tests if a certain character is a wildcard character ('*' or '?').
    boolean
    parseString(String pStringToParse)
    Deprecated.
    Parses a string according to the rules stated above.
    Deprecated.
    Method toString

    Methods inherited from class java.lang.Object

    getClass, notify, notifyAll, wait, wait, wait
  • Field Details

    • ALPHABET

      public static final char[] ALPHABET
      Deprecated.
      Field ALPHABET
    • FREE_RANGE_CHARACTER

      public static final char FREE_RANGE_CHARACTER
      Deprecated.
      Field FREE_RANGE_CHARACTER
      See Also:
    • FREE_PASS_CHARACTER

      public static final char FREE_PASS_CHARACTER
      Deprecated.
      Field FREE_PASS_CHARACTER
      See Also:
  • Constructor Details

    • WildcardStringParser

      public WildcardStringParser(String pStringMask)
      Deprecated.
      Creates a wildcard string parser.
      Parameters:
      pStringMask - the wildcard string mask.
    • WildcardStringParser

      public WildcardStringParser(String pStringMask, boolean pDebugging)
      Deprecated.
      Creates a wildcard string parser.
      Parameters:
      pStringMask - the wildcard string mask.
      pDebugging - true will cause debug messages to be emitted to System.out.
    • WildcardStringParser

      public WildcardStringParser(String pStringMask, boolean pDebugging, PrintStream pDebuggingPrintStream)
      Deprecated.
      Creates a wildcard string parser.
      Parameters:
      pStringMask - the wildcard string mask.
      pDebugging - true will cause debug messages to be emitted.
      pDebuggingPrintStream - the java.io.PrintStream to which the debug messages will be emitted.
  • Method Details

    • isInAlphabet

      public static boolean isInAlphabet(char pCharToCheck)
      Deprecated.
      Tests if a certain character is a valid character in the alphabet that is applying for this automaton.
    • isFreeRangeCharacter

      public static boolean isFreeRangeCharacter(char pCharToCheck)
      Deprecated.
      Tests if a certain character is the designated "free-range" character ('*').
    • isFreePassCharacter

      public static boolean isFreePassCharacter(char pCharToCheck)
      Deprecated.
      Tests if a certain character is the designated "free-pass" character ('?').
    • isWildcardCharacter

      public static boolean isWildcardCharacter(char pCharToCheck)
      Deprecated.
      Tests if a certain character is a wildcard character ('*' or '?').
    • getStringMask

      public String getStringMask()
      Deprecated.
      Gets the string mask that was used when building the parser atomaton.
      Returns:
      the string mask used for building the parser automaton.
    • parseString

      public boolean parseString(String pStringToParse)
      Deprecated.
      Parses a string according to the rules stated above.
      Parameters:
      pStringToParse - the string to parse.
      Returns:
      true if and only if the string are accepted by the automaton.
    • toString

      public String toString()
      Deprecated.
      Method toString
      Overrides:
      toString in class Object
      Returns:
    • equals

      public boolean equals(Object pObject)
      Deprecated.
      Method equals
      Overrides:
      equals in class Object
      Parameters:
      pObject -
      Returns:
    • hashCode

      public int hashCode()
      Deprecated.
      Method hashCode
      Overrides:
      hashCode in class Object
      Returns:
    • clone

      protected Object clone() throws CloneNotSupportedException
      Deprecated.
      Overrides:
      clone in class Object
      Throws:
      CloneNotSupportedException
    • finalize

      protected void finalize() throws Throwable
      Deprecated.
      Overrides:
      finalize in class Object
      Throws:
      Throwable