wordcloud_map()

wordcloud_mapper.wordcloud_map(df, nuts_codes, words, word_counts, scale=2.0, rendering_quality=1, colour_func='random', colour_hue=None, min_font_size=4, max_font_size=None, max_words=200, relative_scaling=0.5, prefer_horizontal=0.9, repeat=False, border_scale='01M', border_sharpness=100, nuts_year=2021, coord_system=3857, shapefiles_path=None)[source]

Create a wordcloud map using data from a DataFrame.

Parameters
dfDataFrame

DataFrame object containing columns with NUTS codes, words and word counts.

nuts_codesstr

Name of the column in the DataFrame containing the NUTS codes.

wordsstr

Name of the column in the DataFrame containing the words.

word_countsstr

Name of the column in the DataFrame containing the word counts.

scalefloat (default = 2.0)

The scale of the produced figure. The given value works as a multiplier of matplotlib’s default figure size. If scale = 1.0, retains default figure size. If scale > 1.0, figure gets bigger by a factor of scale (e.g. 1.5 means 50% bigger). If scale < 1.0, figure gets smaller by a factor of scale (e.g. 0.5 means 50% smaller).

rendering_qualityint (default = 1)

The rendering quality of the words in the wordcloud. Higher values produce better-looking / sharper words but take longer to run.

colour_funcstr (default = “random”)

String indicating which colour function to use. Available values: "random" sets a random luminosity between 0 and 50 for each word within a region. "frequency" sets the luminosity of each word according to their relative frequency or word count, e.g. if the most frequent word A has value 100 and the second most frequent word B has value 50, word A receives a luminosity = 50 and word B = 25. Produces best results when relative_scaling = 1. "rank" sets the luminosity of each word according to their absolute rank, e.g. if there are 5 words, the most frequent word receives luminosity = 50, the second most frequent receives luminosity = 40, and so on. Produces best results when relative_scaling = 0.

colour_hueint or None (default = None)

Sets one specific hue in the HSL colour system for all regions. Choose an integer between 0 and 360.

min_font_sizeint (default = 4)

Smallest font size to use. Word placement will stop when there is no more room to fit words of this size.

max_font_sizeint or None (default = None)

Maximum font size for the largest word. If None, a relative sizing based on the height of the image is used.

max_wordsint (default = 200)

Maximum number of words to be included in wordcloud for each region.

relative_scalingfloat (default = ‘auto’)

Importance of relative word frequencies for font-size. With relative_scaling = 0, only the ranking of words is considered. With relative_scaling = 1, a word that is twice as frequent will have twice the size. In datasets with highly uneven word frequencies, relative_scaling = 1 might lead to very few words being fitted, so a value of around 0.5 often looks better. If relative_scaling = 'auto' it will be set to 0.5 unless repeat = True, in which case it will be set to 0.

prefer_horizontalfloat (default = 0.9)

The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal = 1, no words will be placed vertically. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn’t fit.

repeatbool (default = False)

Whether to repeat already-placed words until max_words or min_font_size is reached.

border_scalestr (default = “01M”)

How detailed the regions’ borders (i.e. the polygon shapefiles) should be, based on the official NUTS values used to download shapefiles. Smaller scales (e.g. “03M”) mean more detailed polygon shapes and thus longer running times. Larger scales (e.g. “60M”) mean less detailed polygon shapes and thus shorter running times. Available values: "60M", "20M", "10M", "03M" or "01M". For a visual explanation, see https://raw.githubusercontent.com/ropengov/giscoR/master/img/README-example-1.png

border_sharpnessfloat or int (default = 100)

Defines how sharp the regions’ border lines look. Higher values create sharper regional border lines but might take considerably longer to run. Change to higher values if zooming into the map is necessary. The value used relates to the DPI (dots per inch) used when generating the mask images.

nuts_yearint (default = 2021)

The year of NUTS regulation, e.g. 2021, 2016, 2013, 2010, 2006 or 2003.

coord_systemint (default = 3857)

4-digit EPSG code (a unique identifier for different coordinate systems). Available values: 4326 (WGS84, coordinates in decimal degrees), 3035 (ETRS 1989 in Lambert Azimutal projection with centre in E52N10, coordinates in meters), 3857 (WGS84 Web Mercator Auxiliary Sphere, coordinates in meters).

shapefiles_pathstring or None (default = None)

Reads shapefiles from a local filepath instead of downloading from GISCO’s database as per default. Useful when internet access is limited. Works with .shp or .zip files. To get local files, visit: https://ec.europa.eu/eurostat/de/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts

Returns
matplotlib.figure.Figure

The wordcloud map as a matplotlib Figure object.

resize_map()

wordcloud_mapper.resize_map(fig, scale)[source]

Resize the matplotlib figure by a given scaling factor.

Parameters
figmatplotlib.figure.Figure

A matplotlib Figure object.

scalefloat

The rescaling factor as a multiplier of the figure’s current size. If scale > 1.0, image gets bigger by a factor of scale (e.g. 1.5 means 50% bigger). If scale < 1.0, image gets smaller by a factor of scale (e.g. 0.5 means 50% smaller). If scale = 1.0, no change is made.

load_companies()

wordcloud_mapper.load_companies(country='DEU')[source]

Load dummy datasets for either Germany (“DEU”) or Italy (“ITA”). These contain the name of the 100 companies with the largest estimated number of employees for each German state (NUTS 1) or each Italian region (NUTS 2). The data was obtained from the 2019 Global Company Dataset published publicly by People Data Labs.

Parameters
countrystr, optional (default = “DEU”)

If country = "DEU", loads the German dataset. If country = "ITA", loads the Italian dataset.

Returns
DataFrame

The DataFrame corresponding to the chosen country.