All Collections
What's New?
February 2021
New on TRAC: Sample the Twitter conversation, get the full picture
New on TRAC: Sample the Twitter conversation, get the full picture

TRAC - What's New

Linda Maruta avatar
Written by Linda Maruta
Updated over a week ago

Want to understand a large conversation topic, but worried this will eat into your monthly data allowance? Fear not: today we are pleased to announce sampling for Realtime Twitter data on TRAC, an efficient way to get a representative sample of a conversation without collecting enormous amounts of unnecessary data in your search.

Sampling your Twitter data will collect a random selection of Tweets that match your query, rather than the entire set of results for your query. The sample percentage –which you can decide– must be between 1 and 100, and the sampling is then applied to the entire query that you set up.

What does sampling mean?

The sample operator first reduces the scope of the Twitter firehose to the percentage you select, and then your query is applied to that sampled subset. If you are using, for example SAMPLE 25 (in Boolean) each Tweet has a 25% chance of being in the sample.

How do I sample my data using the Wizard?

To sample your data using the Pulsar Wizard, go to the Target Section in search set up, and drag the slider to your desired sample size. The sampling is available in increments of 5. And if you want to collect 100% of the Realtime Twitter data available, you do not need to do anything in this step, you will notice that the slider will be set to 100% by default.

How do I sample my data using Pulsar’s Boolean editor?

To sample your data using Boolean, simply add the SAMPLE operator at the end of your Boolean expression as shown in the example below, with the sample size you want, which can be any number between 1 and 100. And if you want to collect 100% of the Realtime Twitter data available, you can either specify SAMPLE 100 or you do not have to explicitly use this operator - Pulsar will automatically assume you want 100% of the Realtime Twitter data available.

Sampling is currently available for real-time Twitter data only, for both existing and new searches. Twitter historical data and all other data sources on Pulsar will continue to offer access to 100% of the data available.

Did this answer your question?