Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyse large amounts of natural language data.

A goal of NLP is to create an autonomous solution which is able to understand the contents of documents, including the contextual nuances of the language within them (much harder than it may first appear). The technology can then accurately extract information and insights, as well as categorise and organise the documents themselves.

Challenges in NLP frequently involve speech recognition, natural language understanding, and natural language generation. Understanding the context is essential. It can be linguistic, involving the linguistic environment of a language item, as well as situational, involving extra linguistic elements that contribute to the construction of meaning.



“Colourless green ideas sleep furiously.”

Take this example from Noam Chomsky in his 1957 Syntactic Structures. It is grammatically correct, yet nonsensical.

“When she arrived home, Nancy watched TV.”

In pragmatic or situational context, this sentence has a cataphoric (preemptive) use of the pronoun she. Without the presence of the subsequent linguistic elements of the sentence, one would be unable to know that “she” refers to Nancy.

“I moved here.”

Deictic terms, like "here", are generally understood to be relative to the location of the speaker, rather than the reader.



Applying NLP in Finance

“Stocks Erase Gains as Stimulus Targeted for Cuts: Markets Wrap”, “Euro-Area Inflation Jumps to Decade-High 3% in Test for ECB”, “Alphabet Surges With 65% Gain in Longest Rally Since 2009”. These are just a few examples of the myriad of articles to be found on Bloomberg’s official website. Staying ahead of the herd in the financial markets requires access to the most relevant and up-to-date information.

In the age of meme stock trading, it is essential to follow social media as well as mainstream financial news outlets. Who remembers when Redditors banded together to artificially inflate the value of meme stocks such as Gamestop, and nearly caused Melvin Capital to go bankrupt. How can investors and traders possibly sift through all this information quickly to take advantage of price movements? Natural Language Processing (NLP) provides the answer.


NLP at Bloomberg

Bloomberg, a leading financial news and data provider, has successfully incorporated NLP into its products and services to perform several functions, including:

  • Answer clients’ queries and extract relevant information from news articles.
  • Assist lawyers with uncovering underlying cases argumentation that support a particular decision.
  • Update reporters with news about sectors or companies for which they are responsible.


  • The strategies deployed to address these scenarios can be categorised as follows:

  • Text processing.
  • Processing of structured/semi-structured data.
  • Connecting text to other artifacts, i.e. people or stock tickers.
  • Simplifying client interactions.


  • Text processing

    The aforementioned functions require the extraction of information from the textual data following low-level pre-processing (i.e. tokenisation, chunking and parsing). The output is then used to perform the following tasks:

  • Named Entity Recognition (NER): detects people, companies, tickers, organisations in news reports and social media. Recent research performed by Bloomberg in this area includes identifying named entities as they are typed or NER applied to the multidomain setting.
  • Sentiment Analysis: predicts whether a news story is positive for a company. A machine learning model (i.e. Naïve Bayes, LSTM, BERT) could be trained on the SemEval-2017 Task 4: Sentiment Analysis in Twitter.
  • Topic Classification: tags document with normalised topics to make retrieval and monitoring straightforward.
  • Fact/Relation Extraction: selects specific information to ease ingestion flow.


  • In addition to these functions, Bloomberg is also researching the classification of news articles into news stories and opinion pieces. These classifications would allow investors and traders to contextualise the articles and improve their investment decisions.

    Processing of structured/semi-structured data

    In addition to texts, many articles and financial reports contain structured/semi-structured data such as tables and figures. Bloomberg has developed the following technologies to process this data:

  • Table detection and segmentation: module that recognises and isolates the tables. Once the tables have been identified, they can be retrieved using keyword or table queries or with natural language questions. Additionally, web tables can be employed to update knowledge bases.
  • Figure understanding: extracts information from scatter plots.


  • Connecting text to other artifacts

    The textual information in articles are linked to entities, e.g. stock tickers or people as well as notable events or related stories:

  • Market moving news indicators: detect news headlines that are crucially important and tag them. Bloomberg has researched methods for connecting candidate news story to notable events. This is framed as a retrieval task where given a notable news story, candidate stories are ranked based on their relevance.
  • Related stories: highlight additional relevant information, potentially using similar retrieval methods as described in “Identifying Notable News Stories”.
  • Bloomberg have also been research how co-occurring entities could be used to explain why a given entity from KG is trending in news stories.


  • Simplifying client interactions

    The previous modules are designed to extract relevant information for Bloomberg’s clients, e.g. traders and investors, but they can also facilitate the interactions clients have with the system:

  • Natural language query interface: allows people to ask plain English questions, e.g. ‘What is Dell Technologies market cap?’ and get precise answers.
  • News search and ranking: returns candidate articles based on search queries. As mentioned before, Bloomberg is researching table retrieval through keyword and table queries. Nonetheless, the current Key News Themes (NSTM) system combines Information Retrieval with Summarisation to respond to user queries with concise digests.
  • Internal help system: automatic routing systems directing incoming queries to appropriate experts. However, user queries could be addressed by chatbots in the future as Bloomberg is pursuing research in that area (A Practical Two-step Approach to Assist Enterprise Question-Answering Live Chat.
  • Automatic answering: detect and answer frequent occurring client inquiries. Methods that for analysing subthreads within conversations can enhance the performance of FAQ-based retrieval systems.


  • NLP for other Areas of Finance

    It has been shown that Bloomberg used NLP methods to perform a wide variety of news and expert support-based functions. Nonetheless, these techniques have financial applications beyond news services and business intelligence. For example, sentiment analysis and topic detection could be employed to create technical indicators for low-frequency algorithmic trading (timespans over which social media has a greater influence on price action).

    NLP could also be applied to risk assessments for loans, i.e. it can measure attitude, entrepreneurial mindset and emotions of both the lender and the borrower. This is particularly beneficial when the borrower in question does not have a past loan payment history.


    Python Libraries for NLP

    The two main libraries used for NLP are NLTK and spaCy. While they have both been used extensively by the NLP community, their intended use is somewhat different: NLTK was originally designed for researchers but spaCy is aimed at developers. This has significant implications for their usage and functionality.

    For each task, NLTK has several algorithms available while spaCy implements only the state-of-the-art which is updated continuously. Therefore, spaCy has a similar performance to NLTK. In addition, the output of NLTK is a list of strings where spaCy returns an object containing words and strings. Finally, spaCy supports word embeddings but NLTK does not.

    The low-level functions, such as tokenization, chunking and parsing, are all supported by spaCy and NLTK. Moreover, both have NER, sentiment analysis functions implemented. The execution of more advanced tasks, e.g. news search, would require further development on top of the existing functionality. In the case of topic modelling, Spacy would need to be used in conjunction with gensim.

    Some practitioners may prefer to use HuggingFace for specific tasks (e.g. sentence classification, extractive question answering, language modelling, NER, summarization and translation). This library is ideal for those who are looking to train and fine-tune transformer models on domain-specific datasets.

    What systems are required to run NLP

    To effectively run NLP on a system, you need the highest performing hardware. BSI offers various types of AI optimised servers running on GPUs, which can all be configured to your exact specifications. You can find out more information on these solutions here or by getting in touch on +44 207 352 7007.


    This article was provided by our AI researcher Gregory Kell.



    DELL Titanium Partner

    We are a DELL Titanium Partner and have received the DELL-EMC Partner of the Year EMEA award.
    Our range of DELL technology solutions for AI can be viewed here.



    To learn more...

    Our AI technology solutions can be viewed here and our AI inception programme here.

    Get in touch to discover how we could optimise your business with AI.