Cocke Younger Kasami ( CYK ) algorithm is a parsing algorithm for context-free grammar. CYK algorithm only operates on context-free grammar which is in CNF ( Chomsky Normal Form ). We must convert the context-free grammar to CNF before applying CYK algorithm. CYK algorithm is a bottom-up parsing approach. It is based on the dynamic programming paradigm.

**Explanation**

CYK algorithm basically checks which substring of the input string is recognized by the given CFG. First, we start with substrings of length 1, then substrings of length 2, and so on.

In CYK algorithm we use a 3D array dp[L, S, P] of type boolean.

- L denotes the length of the substring
- S denotes the starting index of the substring in the input string.
- P denotes the production.

Suppose str[] is the input string. If dp[L, S, P] is **TRUE** then it means that we can recognize the substring of length L from str[S to S+L-1] if we start from production P.

For substrings of length 1, we use set dp[1, S, P] to **TRUE** if there exists a production **P -> a** such that str[S] is equal to **a**.

For substrings of length 2 or more, we partition the substring into two parts in every possible way and check if there exists a production **P -> AB** such that **A** can derive the left part of the string and B can derive the right part. If so, then we can recognize the substring from production **P**.

## Pseudo Code

Letstr[n]be the input string of length n Letvbe the number of production in the grammar Letdp[n, n, v]be a boolean array. InitializedpwithFALSE/* In this loop, we are checking which substring of length 1 is recognized by the given CFG */fors = 1tonforeach production in formP -> a(ais a terminal )ifstr[s]is equal toathenset dp[1, s, P] = TRUE/* In this loop, we are considering substrings of length l >= 2 l is the current length of the substring s is the starting index of the substring p is the size of left partition p is used to partition the substring of length l in 2 parts in every possible way after partitioning, we check if there exists a production P->AB such that A recognize the left partition of length p starting from s and B recognize the right partition of length l-p starting from s + p if so, then it means that we can recognize the substring of length l starting from s from production P */for l = 2tonfor s = 2ton - l + 1for p = 1tol - 1foreach production in formP -> ABifdp[p, s, A]anddp[l - p, s + p, B]areTRUEthenset dp[l, s, P] = TRUEif dp[n, 1, Start Symbol]isTRUEthen input stringstris recognized by the given CFG otherwise input stringstris not recognized by the given CFG

## C++ Implementation

*Assumptions*

- The input grammar is in CNF form.
- Non-terminals are denoted by uppercase alphabets.
- Terminals are denoted by lowercase alphabets.
- Each production is in the form “P->AB” or “P->a” without any blank spaces.
- ‘S’ is the start symbol.
- Each production should be written in a separate line.

**Input Context-Free Grammar**

```
S->AB
A->a
A->BB
B->AS
B->b
```

**Program**

```
#include<iostream>
#include<fstream>
#include<string>
#include<vector>
using namespace std;
/*
function to read grammar from a file
the function pushes each production in vector pro
*/
void ReadGrammar(string file, vector<string>& pro)
{
string line;
ifstream myfile;
myfile.open(file);
while (getline(myfile, line))
{
pro.push_back(line);
}
}
/*
function to check if we can derive string str[]
using productions in pro
*/
bool RecognizeString(vector<string>& pro, string str)
{
int n, v, s, p, i, l;
n = str.length();
v = pro.size();
/*
3D vector dp[n+1, n+1, 26]
*/
vector< vector< vector<bool> > > dp(n + 1, vector< vector <bool> >(n + 1, vector<bool>(26, false)));
/*
the following loop recognize substrings of length 1
*/
for (s = 0; s < n; ++s)
{
/* reading each production */
for (i = 0; i < v; ++i)
{
/*
if pro[i].length == 4 then it means that the production pro[i]
is in form P->a
pro[i][3] == str[s] checks if production P->a can recognize the character str[s]
*/
if (pro[i].length() == 4 && pro[i][3] == str[s])
{
dp[1][s][pro[i][0] - 'A'] = true;
}
}
}
/*
the following loop recognize substrings of length > 1
l is the length of the current substring
*/
for (l = 2; l <= n; ++l)
{
/* s is the starting index of substring */
for (s = 0; s < n - l + 1; ++s)
{
/* p is the size of left partition of the substring */
for (p = 1; p <= l - 1; ++p)
{
/* reading each production */
for (i = 0; i < v; ++i)
{
/*
if pro[i].length() == 5 then it implies that the production is in form P->AB
dp[ p ][ s ][ pro[i][3]-'A'] checks if we can derive left partition using A
dp[ l-p ][ s+p ][ pro[i][4]-'A' ] checks if we can derive the right partition using B
*/
if (pro[i].length() == 5 && dp[p][s][pro[i][3] - 'A'] && dp[l - p][s + p][pro[i][4] - 'A'])
{
dp[l][s][pro[i][0] - 'A'] = true;
}
}
}
}
}
return dp[n][0]['S' - 'A'];
}
int main()
{
vector<string> pro;
string str;
ReadGrammar("text.txt", pro);
cout << "Context-free Grammar" << endl;
for (int i = 0; i < pro.size(); ++i)
{
cout << pro[i] << endl;
}
cout << "Enter a String: ";
cin >> str;
if (RecognizeString(pro, str))
cout << "The input string can be derived using given CFL" << endl;
else
cout << "The input string cannot be derived using given CFL" << endl;
}
```

**Output**

```
Context-free Grammar
S->AB
A->a
A->BB
B->AS
B->b
Enter a String: aabbb
The input string can be derived using given CFL
```

**Read**

**References**