Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence, which is concerned about how human interact with computers using human language. It aims at enabling computers to understand the contexts of documents, including the contextual nuances of the language within them, by processing, analysing and developing insights from large amounts of natural language data.
In other word, NLP enables computers to read, edit and summarise text as well as generate their own “speech.” Consequently, one can accurately extract useful information and insights contained in documents and then also present the knowledge into human languages for a variety of applications, such as recognition of specific concepts mentioned in the text, categorisation of information of the documents and emulate speech of human being.
There are a variety of different types of tasks that NLP can performances.
1| Question Answering
Question answering is one of the most popular research problems in NLP. Some of its applications are chatbots, information retrieval, dialog systems, among others. It serves as a powerful tool to automatically answer questions asked by humans in natural language, with the help of either a pre-structured database or a collection of natural language documents.
Models: Models like BiDAF, BERT, and XLNet can be used for question-answering projects.
2| Text Classification
Text Classification or Text Categorization is the technique of categorizing and analyzing text into some specific groups. This technique supports a comparative evaluation of the impact of linguistic information concerning approaches based on word matching.
Models: BERT, XLNet, and RoBERTa can be used for text classification.
3| Text Summarization
Text summarization is one of the most efficient methods to interpret text information. Text summarization methods can be mainly categorized into two parts – extractive summarization and abstractive summarization. In extractive summarization, the process involves selecting sentences of high rank from any document based on word and sentence features and fusing them to generate a summary. On the other hand, an abstractive summarization is mainly used to understand the main concepts in any given document and then express those concepts in any natural language.
Models: BERTSumExt, BERTSumAbs, and UniLM (s2s-ft) can be used for text summarization.
Dataset: BBC News Summary, Large-Scale Chinese Short Text Summarization Dataset, etc.
4| Sentiment Analysis
Sentiment Analysis is the technique of understanding human sentiments implied in a text, and helps classify emotions using text analysis methods. This technique has witnessed significant traction due to the growth of social media platforms like Facebook, Instagram, and more. Some of the applications of this technique are market research, brand monitoring, customer service, among others.
Models: Models like Dependency Parser, BERT, and RoBERTa can be used for sentiment analysis.
5| Sentence Similarity
Sentence similarity portrays an important part in text-related research and applications in areas such as text mining and dialogue systems. This technique has proven to be one of the best to improve retrieval effectiveness, where titles are used to represent documents in the named page finding task.
Models: BERT, GloVe, etc. can be used for sentence similarity projects.
Dataset: Paraphrase Adversaries from Word Scrambling (PAWS)
6| Speech Recognition
Speech Recognition is the technique used in identifying spoken words or phrases and translating them into machine language. Speech recognition has gained attention in recent years with the dramatic improvements in acoustic modelling yielded by deep feedforward networks.SEE ALSO
Models: BERT, RoBERTa, etc. can be used for speech recognition projects.
7| Neural Machine Translation
Neural machine translation is one of the most popular approaches in NLP research. The neural machine translation aims at building a single neural network that can be jointly tuned to
maximize translation performance.
Models: BERT, RNN Encoder-Decoder, etc.
8| Document Summarization
Document Summarization is the technique of helping readers catch the main points of a long document with less effort. It also helps as a preprocessing step for some text mining tasks such as document classification. This method can be categorized into two different dimensions – abstract-based and extract-based. An extract-based summary includes sentences that are extracted from the document. In contrast, an abstract-based summary may consist of words and phrases which do not appear in the original document.
Models: Hidden Markov Model can be used for document summarization.
Dataset: 20 Newsgroups dataset.