CYK Algorithm

Cocke Younger Kasami ( CYK ) algorithm is a parsing algorithm for context-free grammar. CYK algorithm only operates on context-free grammar which is in CNF ( Chomsky Normal Form ). We must convert the context-free grammar to CNF before applying CYK algorithm. CYK algorithm is a bottom-up parsing approach. It is based on the dynamic programming paradigm.

Explanation

CYK algorithm basically checks which substring of the input string is recognized by the given CFG. First, we start with substrings of length 1, then substrings of length 2, and so on.

In CYK algorithm we use a 3D array dp[L, S, P] of type boolean.

  1. L denotes the length of the substring
  2. S denotes the starting index of the substring in the input string.
  3. P denotes the production.

Suppose str[] is the input string. If dp[L, S, P] is TRUE then it means that we can recognize the substring of length L from str[S to S+L-1] if we start from production P.

For substrings of length 1, we use set dp[1, S, P] to TRUE if there exists a production P -> a such that str[S] is equal to a.

For substrings of length 2 or more, we partition the substring into two parts in every possible way and check if there exists a production P -> AB such that A can derive the left part of the string and B can derive the right part. If so, then we can recognize the substring from production P.

Pseudo Code

 Let str[n] be the input string of length n
 Let v be the number of production in the grammar
 Let dp[n, n, v] be a boolean array. Initialize dp with FALSE

 /*
     In this loop, we are checking which substring of length 1
     is recognized by the given CFG
 */

 for s = 1 to n
     for each production in form P -> a ( a is a terminal )
         if str[s] is equal to a then
             set dp[1, s, P] = TRUE

/*
     In this loop, we are considering substrings of length l >= 2
     l is the current length of the substring
     s is the starting index of the substring
     p is the size of left partition
     p is used to partition the substring of length l in 2 parts in every possible way
     after partitioning, we check if there exists a production P->AB such that
     A recognize the left partition of length p starting from s and
     B recognize the right partition of length l-p starting from s + p
     if so, then it means that we can recognize the substring of length l
     starting from s from production P
 */

 for l = 2 to n
      for s = 2 to n - l + 1
         for p = 1 to l - 1
                for each production in form P -> AB
                   if dp[p, s, A] and dp[l - p, s + p, B] are TRUE then
                      set dp[l, s, P] = TRUE

 if dp[n, 1, Start Symbol] is TRUE then
      input string str is recognized by the given CFG
 otherwise
      input string str is not recognized by the given CFG 

C++ Implementation

Assumptions

  • The input grammar is in CNF form.
  • Non-terminals are denoted by uppercase alphabets.
  • Terminals are denoted by lowercase alphabets.
  • Each production is in the form “P->AB” or “P->a” without any blank spaces.
  • ‘S’ is the start symbol.
  • Each production should be written in a separate line.

Input Context-Free Grammar

S->AB
A->a
A->BB
B->AS
B->b

Program

 #include<iostream>
 #include<fstream>
 #include<string>
 #include<vector>
 using namespace std;
  
 /*
 function to read grammar from a file
 the function pushes each production in vector pro
 */
 void ReadGrammar(string file, vector<string>& pro)
 {
       string line;
       ifstream myfile;
       myfile.open(file);
       while (getline(myfile, line))
       {
             pro.push_back(line);
       }
 }
  
 /*
 function to check if we can derive string str[]
 using productions in pro
 */
 bool RecognizeString(vector<string>& pro, string str)
 {
  
       int n, v, s, p, i, l;
  
       n = str.length();
       v = pro.size();
  
       /*
       3D vector dp[n+1, n+1, 26]
       */
       vector< vector< vector<bool> > > dp(n + 1, vector< vector <bool> >(n + 1, vector<bool>(26, false)));
  
       /*
       the following loop recognize substrings of length 1
       */
       for (s = 0; s < n; ++s)
       {
             /* reading each production */
             for (i = 0; i < v; ++i)
             {
                   /*
                   if pro[i].length == 4 then it means that the production pro[i]
                   is in form P->a
                   pro[i][3] == str[s] checks if production P->a can recognize the character str[s]
                   */
                   if (pro[i].length() == 4 && pro[i][3] == str[s])
                   {
                         dp[1][s][pro[i][0] - 'A'] = true;
                   }
             }
       }
  
       /*
       the following loop recognize substrings of length > 1
       l is the length of the current substring
       */
       for (l = 2; l <= n; ++l)
       {
             /* s is the starting index of substring */
             for (s = 0; s < n - l + 1; ++s)
             {
                   /* p is the size of left partition of the substring */
                   for (p = 1; p <= l - 1; ++p)
                   {
                         /* reading each production */
                         for (i = 0; i < v; ++i)
                         {
                               /*
                               if pro[i].length() == 5 then it implies that the production is in form P->AB
                               dp[ p ][ s ][ pro[i][3]-'A'] checks if we can derive left partition using A
                               dp[ l-p ][ s+p ][ pro[i][4]-'A' ] checks if we can derive the right partition using B
                               */
                               if (pro[i].length() == 5 && dp[p][s][pro[i][3] - 'A'] && dp[l - p][s + p][pro[i][4] - 'A'])
                               {
                                     dp[l][s][pro[i][0] - 'A'] = true;
                               }
                         }
                   }
             }
       }
  
       return dp[n][0]['S' - 'A'];
  
 }
  
 int main()
 {
  
       vector<string> pro;
       string str;
  
       ReadGrammar("text.txt", pro);
  
       cout << "Context-free Grammar" << endl;
  
       for (int i = 0; i < pro.size(); ++i)
       {
             cout << pro[i] << endl;
       }
  
       cout << "Enter a String: ";
       cin >> str;
  
       if (RecognizeString(pro, str))
             cout << "The input string can be derived using given CFL" << endl;
       else
             cout << "The input string cannot be derived using given CFL" << endl;
  
 }

Output

 Context-free Grammar
 S->AB
 A->a
 A->BB
 B->AS
 B->b
 Enter a String: aabbb
 The input string can be derived using given CFL

Read

References

Leave a Comment

Your email address will not be published. Required fields are marked *