analysis module¶
-
cranetoolbox.analysis.countOccurences.aggregate_counts(data, main_variants: List[str], date_format: str) → pandas.core.frame.DataFrame¶ Create a DataFrame with keywords daily counts.
- Parameters
- Returns
A DataFrame with counts for each keyword and each day.
- Return type
pandas.DataFrame
-
cranetoolbox.analysis.countOccurences.count_keywords(input_paths: List[str], keywords: Dict[str, List[str]], date_format: str) → pandas.core.frame.DataFrame¶ Search all tweets for keywords and count their occurences per day.
- Parameters
- Returns
A DataFrame with the number of occurences of each keyword for each day.
- Return type
pandas.DataFrame
-
cranetoolbox.analysis.countOccurences.counts_to_freq(keyword_counts: pandas.core.frame.DataFrame, keywords: Dict[str, List[str]]) → pandas.core.frame.DataFrame¶ For each day, divide the count for each keyword by the daily total.
- Parameters
- Returns
A DataFrame with the count and frequency of each keyword for each day.
- Return type
pandas.DataFrame
-
cranetoolbox.analysis.countOccurences.detect_keywords(text: str, keywords: Dict[str, List[str]]) → Dict[str, bool]¶ Look for each keyword (with variants) in a tweet.
-
cranetoolbox.analysis.countOccurences.get_keywords(path: str) → Dict[str, List[str]]¶ Load the keywords and their variants.
-
cranetoolbox.analysis.countOccurences.get_tweet_counts(path: str) → pandas.core.frame.DataFrame¶ Load the DataFrame with the daily tweet counts.
- Parameters
path (str) – Path to the file with the number of tweets per day.
- Returns
DataFrame with the number of tweets for each day in the dataset.
- Return type
pandas.DataFrame
-
cranetoolbox.analysis.countOccurences.transform_date_format(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Add the date of the “timestamp” column of a DataFrame to a “day” column.
- Parameters
df (DataFrame) – A DataFrame with a “timestamp” column containing pandas datetime objects.
- Returns
df with a new column “day” that corresponds to the date version of the “timestamp” column.
- Return type
DataFrame