
The dataset includes retail prices for meat, fruit, and vegetable products collected over a period spanning from December 2020 to March 2023. The data is structured in tabular format and includes multiple columns providing detailed attributes for each entry. Specifically, each row in the dataset represents the price of a product recorded at a specific date. The columns in the dataset are: date: Date of price collection, format DD/MM/YYYY (e.g., 03/12/2020). price: Retail price in euros (EUR), using a decimal point (.). product_id: A unique identifier assigned to each product. store_id: Anonymized unique identifier of the store where the price was recorded. region: Italian region where the store is located (e.g., Calabria, Lazio). product: Full commercial name of the product, including quantity or weight (e.g., "arance navelina italia calibro 1.5 kg"). COICOP5: Product classification at the 5-digit level based on the COICOP nomenclature (e.g., "Oranges"). COICOP4: Higher-level COICOP category (e.g., "Fruit", "Meat", "Vegetable"). Units and Notes: - Currency: All prices are in euros (EUR).- Quantities: The quantity or weight is included in the product field (e.g., "1.5 kg", "500 g").- Date Format: Dates are in DD/MM/YYYY format.- COICOP classification: Assigned via manual annotation and rule-based categorization using domain-specific keywords. File Information: - Format: CSV (.csv), UTF-8 encoded, comma-separated.- Each row corresponds to one product observation at a specific store on a specific date.- No missing values are present in the cleaned version. This structure facilitates comprehensive analyses, enabling exploration of regional price variations, comparisons across product categories, and time-series investigations into price dynamics within the Italian retail food market. # ------------------------------------------------------------# Sample Code for Dataset Analysis# ------------------------------------------------------------ # Required librariesimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns # Load datasetdf = pd.read_csv("Variations_Food_Prices_Italian_Supermarkets.csv") # Convert date columndf['date'] = pd.to_datetime(df['date']) # Format: YYYY-MM-DD # Define category colorscategory_colors = {"Fruit": "blue", "Vegetable": "green", "Meat": "red"} # ------------------------------------------------------------# Geographic distribution of unique products by region# ------------------------------------------------------------geo = df.groupby(["region", "COICOP4"])["product_id"].nunique().reset_index()pivot_geo = geo.pivot(index="region", columns="COICOP4", values="product_id").fillna(0)pivot_geo["Total"] = pivot_geo.sum(axis=1)pivot_geo = pivot_geo.sort_values("Total", ascending=False).drop(columns="Total")pivot_geo = pivot_geo[["Fruit", "Meat", "Vegetable"]] pivot_geo.plot(kind="bar", stacked=True, figsize=(10,6), color=["blue", "red", "green"])plt.ylabel("Number of Unique Products")plt.title("Geographic Distribution by Region and Category (Sorted)")plt.xticks(rotation=45, ha="right")plt.legend(title="Category")plt.tight_layout()plt.show() # ------------------------------------------------------------# Basic analysis: average price trend over time (by COICOP4)# ------------------------------------------------------------price_trend = df.groupby(["date", "COICOP4"])["price"].mean().reset_index() plt.figure(figsize=(10,5))sns.lineplot(data=price_trend, x="date", y="price", hue="COICOP4", palette=category_colors)plt.title("Average Price Over Time by COICOP4 Category")plt.xlabel("Date")plt.ylabel("Average Price (€)")plt.legend(title="Category")plt.tight_layout()plt.show()
Web Scraping, Economics, Price Analysis, Food Prices, Retail Prices, Supermarkets
Web Scraping, Economics, Price Analysis, Food Prices, Retail Prices, Supermarkets
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
