You will understand the concept of entity analysis within Pulsar.
You will learn about the different types of entities that can be detected within Pulsar.
You will gain insights into how TRAC uses entity detection to identify relevant information in posts and articles that your searches have collected.
Named Entity Recognition (NER) is a subtask of Natural Language Processing (NLP) and as the name suggests, is the process of identifying and classifying units (named entities), in a given text. And these entities can be words or phrases that represent names of people, names of organisations, places, products, etc. For example, in the sentence: "Barack Obama was born in Honolulu, Hawaii, on August 4, 1961," a Named Entity Recognition system would identify and classify the following entities:
"Barack Obama" as a Person
"Honolulu" as a Location
"Hawaii" as a Location
"August 4, 1961" as a Date
Entity classification can therefore help you extract important information from large volumes of unstructured text. On Pulsar, this is done by identifying and classifying named entities referenced in tweets, posts on Reddit, Facebook, Instagram, news articles, and even broadcast or podcast transcripts. Entity classification also addresses disambiguation, so helps distinguish entities with similar names. For example, you can with confidence, distinguish between "Apple" the company and "Apple" the fruit. On Pulsar, we also apply Sentiment and Emotion Analysis to the entities we identify, so that you can understand the sentiment and emotion associated with a given entity.
Currently, entities on TRAC can be classified as People or Organisations and we can analyse entities in the following languages: Arabic, English, Chinese, Danish, Dutch, Finnish, French, German, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, and Swedish.
What insights can I uncover through Entities?
We surface entity classification on TRAC in the Content Insights section and provide you different ways to understand Entities.
Entities Treemap by Data Source
The Treemap visualization provides insights into the channels where individuals or organizations are discussed most frequently. The size of the tile represents the prevalence of the entity being discussed on a particular channel. It also helps to identify if certain entities are over-indexed or under-indexed on some data sources. For instance, the screenshot below shows that Elizabeth Warren is a highly discussed entity across all channels. However, the Treemap highlights that she over-indexes by 2.8% on X compared to other data sources, which is an interesting insight.
Entities Sentiment Word Cloud
When looking at entities displayed in a sentiment word cloud, you can understand the most common entities in a search and the sentiment associated with those entities. The bigger the size of the entity, plus the more central it is in the graph, then the greater the number of posts and articles discussing that entity. This is a simple yet insightful visualisation that gives you some insight into not only what's popular, but also the overall sentiment about that particular person or company.
Entities Emotion Word Cloud
When looking at entities displayed in an emotions word cloud, you can understand the most common entities in a search and the emotion associated with those entities. The bigger the size of the entity plus the more central it is in the graph, then the greater the number of posts and articles discussing that entity. Similar to the Sentiment Word Cloud above, this visualisation gives you some insight into not only what's popular, but also how people explicitly feel about that particular person or company.
When looking at entities displayed in a segments or network chart, you can start to identify groups of people, or organisations that tend to be associated or discussed together in the same conversation. This can be useful to help understand how your brand or client is associated with certain organisations or individuals. Related entities are grouped together into distinct segments, and the clustering algorithm we apply determines the relevance and importance of each person or organisation within a given segment, helping you uncover the dominant people or organisations in a dataset.
Analysing entities displayed in a Stream graph enables you to track how the discussion around particular individuals or organisations evolves over time. This means you can quickly identify when your client or organisation was mentioned, as well as when the conversation about them subsided, and possibly resumed. By correlating this information with relevant media events, you can gain valuable insights into why certain entities were included in the discourse.
Sometimes known as a chord diagram, the Entities Bundle chart is a graphical representation of the relationships between the different people or organisations in a dataset. It's a great way to visualise the inter-relationships and flows between the entities as arcs or chords, that connect the entities. Each person or organisation is represented as a segment around the perimeter of the circle, with the chords connecting them representing the degree of overlap, or connection between the entities. The bigger the segment around the perimeter of the circle is, then the more connections that organisation or person has with other entities in the bundle chart.