Index: > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Business Industries Finance Tax

Home > Earley parser


The Earley parser is a type of chart parser mainly used for parsing in computational linguistics, named after its inventor, Jay Earley .

Earley parsers are appealing because they can parse all context-free languages. The Earley parser executes in cubic time in the general case, and quadratic time for unambiguous grammars. It performs particularly well when the rules are written left-recursively .

1 Performing the Algorithm

To understand how Earley's algorithm executes, you have to understand dot notation. Given a production A -> BCD (where B, C, and D are symbols in the grammar, terminals or nonterminals), the notation A -> B • C D represents a condition in which B has already been parsed and the sequence C D is expected.

For every input position (which represents a position between tokens), the parser generates a state set. Each state is the cartesian product (that is, just the combination) of:

The state at input position k is called S(k). The parser is seeded with S(0) being the top-level rule. The parser then iteratively operates in three stages: prediction, scanning, and completion. In the following descriptions, α, β, and γ represent any sequence of terminals/nonterminals (including the null sequence), X, Y, and Z represent single nonterminals, and a represents a terminal symbol.

These steps are repeated until no more states can be added to the set. This is generally realized by making a queue of states to process, and performing the corresponding operation depending on what kind of state it is.

2 Example

The algorithm is hard to see from the abstract description above. It becomes much clearer how it operates once you see it in action. The output is a little verbose, but you should be able to follow it.

Let's say you have the following simple arithmetic grammar:

P -> S # the start rule S -> S + M | M M -> M * T | T T -> number

And you have the input:

2 + 3 * 4

Here are the generated state sets:

(state no.) Production (Origin) # Comment --------------------------------- == S(0): • 2 + 3 * 4 == (1) P -> • S (0) # start rule (2) S -> • S + M (0) # predict from (1) (3) S -> • M (0) # predict from (1) (4) M -> • M * T (0) # predict from (3) (5) M -> • T (0) # predict from (3) (6) T -> • number (0) # predict from (5) == S(1): 2 • + 3 * 4 == (1) T -> number • (0) # scan from S(0)(6) (2) M -> T • (0) # complete from S(0)(5) (3) M -> M • * T (0) # complete from S(0)(4) (4) S -> M • (0) # complete from S(0)(3) (5) S -> S • + M (0) # complete from S(0)(2) (6) P -> S • (0) # complete from S(0)(1) == S(2): 2 + • 3 * 4 == (1) S -> S + • M (0) # scan from S(1)(5) (2) M -> • M * T (2) # predict from (1) (3) M -> • T (2) # predict from (1) (4) T -> • number (2) # predict from (3) == S(3): 2 + 3 • * 4 == (1) T -> number • (2) # scan from S(2)(4) (2) M -> T • (2) # complete from S(2)(3) (3) M -> M • * T (2) # complete from S(2)(2) (4) S -> S + M • (0) # complete from S(2)(1) (5) S -> S • + M (0) # complete from S(0)(2) (6) P -> S • (0) # complete from S(0)(1) == S(4): 2 + 3 * • 4 == (1) M -> M * • T (2) # scan from S(3)(3) (2) T -> • number (4) # predict from (1) == S(5): 2 + 3 * 4 • == (1) T -> number • (4) # scan from S(4)(2) (2) M -> M * T • (2) # complete from S(4)(1) (3) M -> M • * T (2) # complete from S(2)(2) (4) S -> S + M • (0) # complete from S(2)(1) (5) S -> S • + M (0) # complete from S(0)(2) (6) P -> S • (0) # complete from S(0)(1)

And now the input is parsed, since we have the state (P -> S •, 0) (note that we also had it in S(3) and S(1); those were complete sentences).

3 See also

Parsing algorithms



Non User