How to build NLP Pipeline..

3 min readJan 25, 2022

--

Building an NLP pipeline includes starting with raw texts and analyzing them, processing them by extracting relevant words and meaning, understanding the context to an extent which is the feature extraction, and building a model that can express the intention of doing something from the sentence. While building a pipeline the workflow may not be linear.

Text Processing

We would always think why do we need to process text? Why not directly provide the text? So, we will see where does this text come from before processing them.
Most text may be available on web pages like Wikipedia, or maybe some sentence spoken by someone in a movie or even a speech given by our favorite motivational speaker.
In the case of web pages, we have the text which is embedded inside HTML tags and we must retain only important text before extracting relevant features from them.
There may be URLs, symbols, etc.. which may not make any sense for what we do and need to be removed.

Feature Extracting

Now that we have processed the text and we got relevant data can we directly build the mode? Not quite. This is because computers are machines that process data in a special encoding like binary.
It cannot understand the English we speak. Computers don’t have any standard representation for words. These are internally a sequence of ASCII or Unicode values, but don’t capture meaning or context. So, building a good model may require proper features being extracted from processed data. This completely depends on what task we want to accomplish. We represent words in different forms like maybe graphical networks like for WordNet.
We can use an encoding to give probability to particular words such that they are represented in an array form. We use vectors in text generation and machine translation. These can be seen in word2vec or glove representations. There are many ways of representing such text pieces of information

Modelling

In this stage, we build a model such a machine learning or deep learning based on our requirements.
We use the data we have and train them in our model. These trained data are used such that it gives the model experience and the model is said to learn from these experiences. In the future when new unseen data arrives the model can predict the outcome like say predict a word or predict a sentiment.

Conclusion

Thanks for reading this NLP Pipeline! I hope it proves useful for anyone interested in NLP.

Machine Learning

Written by Mahfooz Ahamed

Graduated | Postergraduate Msc Big Data Analytics

No responses yet

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams