02. Top mentions
mention_histogram_alt_python, the mirror of example 01, but for @user mentions instead of #hashtags. Internally it uses advertools.extract_mentions partitioned across workers; advertools normalizes mentions to lowercase, so @Alice and @ALICE collapse onto the same @alice bucket.
What you'll see
Loaded 10 tweets.
Top 5 mentions:
Mentions Frequency
@kaggle 5
@openai 4
@huggingface 3
@nasa 2
@bbc 1
The code
The inline dataset pairs two mentions per row so the frequencies are easy to verify by eye:
_ROWS = [
('2022-01-01T00:00:00', 'morning tools roundup @kaggle @openai'),
('2022-01-01T01:00:00', 'new model leaderboard @kaggle @huggingface'),
('2022-01-01T02:00:00', 'climate dashboard @kaggle @nasa'),
('2022-01-01T03:00:00', 'fine-tuning report @openai @huggingface'),
# ...5 more rows...
('2022-01-01T09:00:00', 'product launch'),
]
The work is identical to tutorial 01 except for the analytic call:
from whistlerlib import Context
ctx = Context('processes', 'localhost', 8786)
ds = ctx.load_csv(
filen=csv_path,
meta={
'column_mapping': {'date_column': 'Date', 'text_column': 'text'},
'file_encoding': 'utf-8',
},
num_partitions=2,
)
print(f'Loaded {ds.tweet_count()} tweets.')
histogram = ds.mention_histogram_alt_python(k=5)
print(histogram.to_string(index=False))
mention_histogram_alt_python(k=5) ships a map_partitions closure that calls advertools.extract_mentions per partition; the scheduler merges the partial frequency tables and returns the top-5 as a pandas DataFrame.
The full file (including the tempfile setup and CLI shim) is at
examples/02-mention-histogram/example.py.
Differences from example 01
- The result columns are
['Mentions', 'Frequency'](capital-M, capital-F) rather than['tag', 'freq']. This mirrors the underlyingadvertoolsfield names. Phase 2 standardized the alt-python hashtag path to use['tag', 'freq']for consistency with the R-bridge; the mention path kept the advertools names because they're already used widely. - Mentions are returned in lowercase regardless of original casing.
Run it
# From examples/02-mention-histogram/, bring up a local Dask cluster, run the example, tear it down.
docker compose -f ../../docker/docker-compose.yml up -d
python example.py
docker compose -f ../../docker/docker-compose.yml down