Quickstart
The minimum-viable Whistlerlib workflow. Connects to a running Dask cluster, loads a CSV of tweets, computes the top-k hashtags as a distributed histogram.
What you'll see
Loaded 10 tweets.
Top 5 hashtags:
tag freq
#cdmx 3
#política 2
#méxico 2
#noticias 2
#ciencia 1
How it works
Context('processes', host, port)opens a Dask client against the scheduler exposed by the master service indocker/docker-compose.yml.load_csv(...)wrapsdask.dataframe.read_csvand returns aTweetDatasetover a Dask DataFrame partitioned across the cluster's workers.hashtag_histogram_alt_python(k=5)ships amap_partitionsclosure to each worker, each worker runsadvertools-style hashtag extraction on its slice, the scheduler merges the partial frequency tables, and the top-5 by frequency is returned as a pandas DataFrame.
Run it
docker compose -f ../../docker/docker-compose.yml up -d
python example.py
docker compose -f ../../docker/docker-compose.yml down
Or via pytest:
uv run pytest -m docker tests/integration/test_01_quickstart_hashtag_histogram.py