analysis module

cranetoolbox.analysis.countOccurences.aggregate_counts(data, main_variants: List[str], date_format: str) → pandas.core.frame.DataFrame

Create a DataFrame with keywords daily counts.

Parameters
  • data (list(dict())) – List of dictionaries, each dictionary with a date, boolean indicators for the presence of each keyword, and a 1-valued ‘total’ column.

  • main_variants (list(str)) – List of the keywords main variants.

  • date_format (str) – String defining the format of dates in the dataset.

Returns

A DataFrame with counts for each keyword and each day.

Return type

pandas.DataFrame

cranetoolbox.analysis.countOccurences.count_keywords(input_paths: List[str], keywords: Dict[str, List[str]], date_format: str) → pandas.core.frame.DataFrame

Search all tweets for keywords and count their occurences per day.

Parameters
  • input_paths (list(str)) – The list of the paths to the input files.

  • keywords (dict(str, list(str))) – The dictionary of keywords with their variants.

  • date_format (str) – String defining the format of dates in the dataset.

Returns

A DataFrame with the number of occurences of each keyword for each day.

Return type

pandas.DataFrame

cranetoolbox.analysis.countOccurences.counts_to_freq(keyword_counts: pandas.core.frame.DataFrame, keywords: Dict[str, List[str]]) → pandas.core.frame.DataFrame

For each day, divide the count for each keyword by the daily total.

Parameters
  • keyword_counts (pandas.DataFrame) – DataFrame with the number of occurences of each keyword for each day.

  • keywords (dict(str, list(str))) – The dictionary of keywords with their variants.

Returns

A DataFrame with the count and frequency of each keyword for each day.

Return type

pandas.DataFrame

cranetoolbox.analysis.countOccurences.detect_keywords(text: str, keywords: Dict[str, List[str]]) → Dict[str, bool]

Look for each keyword (with variants) in a tweet.

Parameters
  • text (str) – The preprocessed text of the tweet.

  • keywords (dict(str, list(str))) – The dictionary of keywords with their variants.

Returns

A dictionary indicating the presence or absence of each keyword.

Return type

dict(str, bool)

cranetoolbox.analysis.countOccurences.get_keywords(path: str) → Dict[str, List[str]]

Load the keywords and their variants.

Parameters

path (str) – Path to the JSON file with the keywords.

Returns

The dictionary with the keywords and their variants.

Return type

dict(str, list(str))

cranetoolbox.analysis.countOccurences.get_tweet_counts(path: str) → pandas.core.frame.DataFrame

Load the DataFrame with the daily tweet counts.

Parameters

path (str) – Path to the file with the number of tweets per day.

Returns

DataFrame with the number of tweets for each day in the dataset.

Return type

pandas.DataFrame

cranetoolbox.analysis.countOccurences.transform_date_format(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Add the date of the “timestamp” column of a DataFrame to a “day” column.

Parameters

df (DataFrame) – A DataFrame with a “timestamp” column containing pandas datetime objects.

Returns

df with a new column “day” that corresponds to the date version of the “timestamp” column.

Return type

DataFrame