Natural Language Processing (NLP): A New Age in Computer Science and Engineering


By KishorjitNongmeikapam

It is well said by someone that the pace of development inthe Computer Science and Engineering is so fast that if it did happen to other engineering science then we would have already explore the universe by now. May be in Automobile Engineering we would have a flying car by now! It was only in 1937 and 1940s the IBM took the credit of designing the commercial programmable modern computer and the real engineering of Computer Science had started. There is no looking back after this and the pace become so fast that we are cornered to a position where a real interaction with the machine becomes almost real, an interaction with the machine with our Human Natural Language.
At the beginning we have instructed a computer through a program but we human are so mean and lazy that we keep on thinking the latest gadgets and ways which will make us comfortable and easily accessible. One of the ultimate solutions for the interaction thus becomes our Natural Language. Instead of going to a long programming algorithm and programming knowledge the computer scientist started thinking about a way to give commands to the computer or machine through our Natural Language i.e the human natural language we speak. This is the beginning of the new thinking or a new area in Computer Science and Engineering and the Natural Language Processing was born.
Natural Language Processing (NLP)is an area of Computer Science which is concerned with the interaction of human natural language and the computer. The interaction can be in the form of voice (eg. Speech recognition, text or the image). Technically, NLP requires the application of computer science which eventually includes Artificial Intelligence (AI), Soft Computing and Linguistic knowledge. Linguistic knowledge of a human natural language is required in order to verify the input and the output of the system creation. The syntax and semantics of the language structure is check by the linguistic rule.
The works on Natural Language Processing was started even earlier but the work on “Computing Machinery and intelligence” by Alan Turing in 1950 is marked as the beginning. This work was basically proposed what is now called the Turing test as a criterion of intelligence. The origin of NLP had a coincidence with the development of AI. The term Artificial Intelligence or in short AI is a field in computer science which studies about the ways and the techniques in order to make the machine exhibits intelligence. The work of NLP exhibits the use of the techniques of AI for the machine learning of a Natural Language.
Is there any difference between Computational Linguistics (CL) and Natural Language Processing (NLP)?
The answer is ‘YES’. In Computational Linguistics the computational techniques are used to better understand linguistics as a discipline, while in NLP a computer is targeted to understand human language. At some point it may be considered that the goal of CL and NLP looks similar but in depth it is different. To make it distinct, the goals of NLP don’t explicitly care if it is making new contributions to linguistics, and computational linguistics doesn’t explicitly care if it is making easier for computers to understand natural languages. Those people working in NLP generally don’t have any formal background in Linguistics, and don’t really know much about language. CL people have the formal background in Linguistics, and CL is often taught in Linguistics Departments along with the necessary computing skills. In short the motive of CL are to understand more about language, while the motive of NLP are to achieve specific performance goals in a computational context – e.g. specific computer applications, or as an abstract problem in machine learning. In NLP the work of a linguist is to verify the correctness of a language generated by the machine. The only similarity in both the CL and NLP is that both use computer as one of the tools.
What we really do in NLP?
The applications of Natural Language Processing (NLP) are few things which we can consider and discuss. There are lot of things which can be discussed when we talk about the applications of NLP, to name a few are Stemmer, Morphological analyzer, Part of speech (POS) tagger, Chunking and Parsing, Spell Checker, Word Sense Disambiguation (WSD), Name Entity Recognition (NER), Optical Character Recognition (OCR), Question Answering (QA), Information Retrieval (IR), Automatic Summarization, Sentiment or Emotion Analysis, Transliteration, Speech Recognition, Proof reading, Machine Translation (MT) etc. Let’s talk about some of the interesting application topics.
Stemmer:The main entries in a standard dictionary are generally the root word which is followed by the derivational or inflected word forms. This lexicon is the basic thing of any language in order to start learning a language. Like the same way the machine needs to identify the root word in order to carry forward with any of the NLP work. Say for example in Manipuri the word “???????????????????????” (“pusinh?nj?r?mg?d?b?nid?ko”), which means “(I wish I) myself would have caused to bring in (the article)”. Here there are 10 (ten) suffixes being used in a verbal root, they are “pu” is the verbal root which means “to carry”, “sin”(in or inside), “h?n” (causative), “j?” (reflexive), “r?m” (perfective), “g?” (associative), “d?” (particle), “b?” (infinitive), “ni” (copula), “d?” (particle) and “ko” (endearment or wish). The main target is with the root “pu”(Example taken from Nonigopal Singh, N: A Meitei Grammar of Roots and Affixes, A Thesis, Unpublished, Manipur University, Imphal (1987)). Identification of such root automatically is called the Stemmer.
Designing a new dictionary for a language will be limited at the beginning because the unique word to be listed by an expert will be limited to what he/she has in his mind. Let’s think of a way which will be assisted by a computer. If we have a corpus(a collection of written texts) with millions of word and if we are able to design such a system which can identify the unique word and its root from such a large corpus the word list will be huge! The list will be beyond the limit of a human mind. Say for British English we have British National Corpus (BNC) and Brown University Standard Corpus of Present-Day American English (or just Brown Corpus)with 100 million word collection, also we have the Oxford English Corpus with 2.5 billion words of real 21st century English, Reuter (the news agency) also have corpus collectionwith millions of words etc. So from such a large corpus the identification of unique word will really help in designing a lexicon or dictionary with almost all the words included. This may not be possible if the computer doesn’t assist.
Once the lexicon with maximum words is identified then teaching to anyone and to the machine will be a very easy task.
Transliteration:The Govt. of Manipur is worried about the change in script of the Manipuri language especially with the Department of Education and University. This is because the old script of Bengali is phasing out and is replacing with the Meitei Mayek. This term is related withthis problem. Automatic change of script from one script to another is called transliteration. There are lot of books written in Bengali script. If transliteration is done manually it may take another 10-20 years or may be more. So what about a system which can transliterate within few seconds for a big book with thousands of words? Of course it is possible and many languages in the world do have such systems. TheManipuri Transliteration software is being designed at Manipur Institute of Technology, Manipur University with a small funding from Department of Science and Technology, Govt. of Manipur. The Fig the screen shot of the simple Manipuri Transliteration software. The software will be very helpful in future and also to those print media who publish newspapers with multiple scripts.

Fig 1: Manipuri Transliteration
This system will definitely assist in reduction of manual effort and will be very much efficient with time consumption.
Machine Translation: Many people talk about Look East Policy. Let’s ask, “what is the feasibility?” Special target is with the business and commerce. What will happen if you can’t communicate with your fellow business or commerce partners? Language will be the main barrier and what is the solution? The solution is with the Machine translation. By Machine Translation we mean the translation of one language to another language through machine. Many times in science fiction movies we are able to see such gadgets which can translate a language and communicate. This topic makes us live to the reality.

Fig 2: Google Translator
One simple example is in the Fig. 2 which is the screen shot of the popular Google translator which helps in translating from one language to another language automatically!! This simple to see but complex to develop gadget can be fit to small hand held mobile to interact with rest of the peoples in any part of the world. So why should we worry when we have this gadget with us to travel without an interpreter!!
Manipuri translators may bring closer among the people in the state, the country and the Look East Policy may also be a successful one!!
Speech recognition: Some of the mobile these days do have this facility. To unlock a mobile we need to talk with our voice in front of the mobile. Also in some secure rooms one has to talk with his own voice only to open the room or locker. The advancement has taken to such a peak that what you talk will be converted to text or vice versa. May be one day we won’t need to go to school to learn reading or writing of a language. Only we may need is to speak the language.Is not it surprising? It is because if you want to write; just say it in front of the gadget, it will automatically convert it into text that is from voice to text, or you type it will automatically start talking. Even you can have the facility of Machine Translation and it will be so easy to make conversation with someone you love to talk but don’t know his/her language.
Spell Checker:Those who usually use computer system to type a document sometimes feel like they become weak with the spelling of a word or may be confusing. This happens because the editor which we type usually has a spell checker. If we type wrongly on the MS Word it is automatically corrected or warns us with a red underline that what we have written is wrong. May be we are suggested with other nearby words. Such is done by what we call a spell checker.
Part of speech (POS) tagger:We have been talking about Machine Translation but will it be possible if we have not written a correct sentence?Will it be possible with a direct word to word translation? The answer is a big ‘NO’. Reason is, in English language the structure of a sentence is ‘Subject-Verb-Object’ but in Manipuri it is ‘Subject-Object-Verb’. So, in order to perform any of the NLP application we need to identify the Part of Speech of the language automatically.
Question Answering (QA): The era is coming soon with what we call as the virtual reality. May be one day there won’t be any physical University in some location or maybe we will be attending the class or lecturer in our living room itself. What about the exam? Even the exam will be on your room with your laptop or computer on. Who will correct the answer script? The COMPUTER! Of course correcting the multiple choice question is so easy by a computer but what about the short answer type, long answer type or the descriptive type? Yes, this is very much possible. A Professor at IIIT, Hyderabad has already undertaken project thinking about the future Virtual University and reported that it has above 90% accurate of correcting a descriptive answer by the computer.
One simple example is your Google Search. You type your query and you will be listed with lots of answer as links.
Automatic Summarisation:Again taking the Google Search example, for every link we will be suggested with few line summarization of the link or website. This really helps us by giving a chunk of related text which is important from the whole document. If you are busy enough that you only prefer to read the summary then this is the option.
Sentiment or Emotion Analysis:Whyemotion? Why sentiment? Our future is with the robots. Can you believe this? We have to. Industries and factories are replacing with robots or robotic arms. For example the Mercedes cars which are very expensive are not fitted manually but are fitted maximum by the Robotic arms. Japanese are very advance as we have been hearing since our childhood. The central nervous system in a Robot is a computer. The only problem with the robot is that it doesn’t have any emotive sense like love, dejection, sympathy etc. The computer scientists these days have started thinking about the infusion of the sentiments or emotions to a robot. Think what will happen next if this is successful.
The other area is with the forensic science. When someone dies suddenly in a suspicious circumstances and he/she wrote something just before he/she expire. If we have some mechanism to identify the sentiment or emotion of what he/she has written then we can very much come to a conclusion whether the case is suicidal or murder. It’s because an emotionally happy man or woman just few minute before has minimum chances to suicide!
Proof reading:We are not perfect! This is the saying which we have been listening and following. So what we have written may not be same per cent correct. One has to give for a proof reading. In much of the leading publishersof book or news print media they do proof reading or sometime outsource. Thus, it becomes time consuming and lot of manual supervision. So, let’s think of the computer doing this job. This is very much possible and time saving. Starting from spelling mistakes to sentence correction can easily be undertaken by the computer.
Where do we stand now? An era where computer are the substitutes which is ready to replace our entire manual works specially the cognitive part. Of them the Natural Language Processing, which is one of the growing research areas in Computer Science and Engineering would add altogether a new dimension.

(The Author is Asst. Professor, Dept. of Computer Sc. and Engg.,MIT, Manipur University)


Please enter your comment!
Please enter your name here