Quickstart

The minimum-viable Whistlerlib workflow. Connects to a running Dask cluster, loads a CSV of tweets, computes the top-k hashtags as a distributed histogram.

What you'll see

Loaded 10 tweets.
Top 5 hashtags:
       tag  freq
   #cdmx     3
#política     2
 #méxico     2
#noticias     2
 #ciencia     1

How it works

Context('processes', host, port) opens a Dask client against the scheduler exposed by the master service in docker/docker-compose.yml.
load_csv(...) wraps dask.dataframe.read_csv and returns a TweetDataset over a Dask DataFrame partitioned across the cluster's workers.
hashtag_histogram_alt_python(k=5) ships a map_partitions closure to each worker, each worker runs advertools-style hashtag extraction on its slice, the scheduler merges the partial frequency tables, and the top-5 by frequency is returned as a pandas DataFrame.

Run it

docker compose -f ../../docker/docker-compose.yml up -d
python example.py
docker compose -f ../../docker/docker-compose.yml down

Or via pytest:

uv run pytest -m docker tests/integration/test_01_quickstart_hashtag_histogram.py

What you'll see​

How it works​

Run it​

What you'll see

How it works

Run it