Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyse large amounts of natural language data.
A goal of NLP is to create an autonomous solution which is able to understand the contents of documents, including the contextual nuances of the language within them (much harder than it may first appear). The technology can then accurately extract information and insights, as well as categorise and organise the documents themselves.
Challenges in NLP frequently involve speech recognition, natural language understanding, and natural language generation. Understanding the context is essential. It can be linguistic, involving the linguistic environment of a language item, as well as situational, involving extra linguistic elements that contribute to the construction of meaning.
Applying NLP in Finance
“Stocks Erase Gains as Stimulus Targeted for Cuts: Markets Wrap”, “Euro-Area Inflation Jumps to Decade-High 3% in Test for ECB”, “Alphabet Surges With 65% Gain in Longest Rally Since 2009”. These are just a few examples of the myriad of articles to be found on Bloomberg’s official website. Staying ahead of the herd in the financial markets requires access to the most relevant and up-to-date information.
In the age of meme stock trading, it is essential to follow social media as well as mainstream financial news outlets. Who remembers when Redditors banded together to artificially inflate the value of meme stocks such as Gamestop, and nearly caused Melvin Capital to go bankrupt. How can investors and traders possibly sift through all this information quickly to take advantage of price movements? Natural Language Processing (NLP) provides the answer.
Bloomberg, a leading financial news and data provider, has successfully incorporated NLP into its products and services to perform several functions, including:
The strategies deployed to address these scenarios can be categorised as follows:
Text processing
The aforementioned functions require the extraction of information from the textual data following low-level pre-processing (i.e. tokenisation, chunking and parsing). The output is then used to perform the following tasks:
In addition to these functions, Bloomberg is also researching the classification of news articles into news stories and opinion pieces. These classifications would allow investors and traders to contextualise the articles and improve their investment decisions.
Processing of structured/semi-structured data
In addition to texts, many articles and financial reports contain structured/semi-structured data such as tables and figures. Bloomberg has developed the following technologies to process this data:
Connecting text to other artifacts
The textual information in articles are linked to entities, e.g. stock tickers or people as well as notable events or related stories:
Simplifying client interactions
The previous modules are designed to extract relevant information for Bloomberg’s clients, e.g. traders and investors, but they can also facilitate the interactions clients have with the system:
NLP for other Areas of Finance
It has been shown that Bloomberg used NLP methods to perform a wide variety of news and expert support-based functions. Nonetheless, these techniques have financial applications beyond news services and business intelligence. For example, sentiment analysis and topic detection could be employed to create technical indicators for low-frequency algorithmic trading (timespans over which social media has a greater influence on price action).
NLP could also be applied to risk assessments for loans, i.e. it can measure attitude, entrepreneurial mindset and emotions of both the lender and the borrower. This is particularly beneficial when the borrower in question does not have a past loan payment history.
The two main libraries used for NLP are NLTK and spaCy. While they have both been used extensively by the NLP community, their intended use is somewhat different: NLTK was originally designed for researchers but spaCy is aimed at developers. This has significant implications for their usage and functionality.
For each task, NLTK has several algorithms available while spaCy implements only the state-of-the-art which is updated continuously. Therefore, spaCy has a similar performance to NLTK. In addition, the output of NLTK is a list of strings where spaCy returns an object containing words and strings. Finally, spaCy supports word embeddings but NLTK does not.
The low-level functions, such as tokenization, chunking and parsing, are all supported by spaCy and NLTK. Moreover, both have NER, sentiment analysis functions implemented. The execution of more advanced tasks, e.g. news search, would require further development on top of the existing functionality. In the case of topic modelling, Spacy would need to be used in conjunction with gensim.
Some practitioners may prefer to use HuggingFace for specific tasks (e.g. sentence classification, extractive question answering, language modelling, NER, summarization and translation). This library is ideal for those who are looking to train and fine-tune transformer models on domain-specific datasets.
What systems are required to run NLP
To effectively run NLP on a system, you need the highest performing hardware. BSI offers various types of AI optimised servers running on GPUs, which can all be configured to your exact specifications. You can find out more information on these solutions here or by getting in touch on +44 207 352 7007.
This article was provided by our AI researcher Gregory Kell.
DELL Titanium Partner
We are a DELL Titanium Partner and have received the DELL-EMC Partner of the Year EMEA award.
Our range of DELL technology solutions for AI can be viewed here.
To learn more...
Our AI technology solutions can be viewed here and our AI inception programme here.
Get in touch to discover how we could optimise your business with AI.