
This project is divided into two folders: main experimental project and frequent item mining ---------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------- # Main Experimental Project # Project Structure ## Folders: * **data**: Contains the real and synthetic datasets used in experiments, including raw datasets and intermediate files generated during data cleaning. * **src**: Contains all experiment code. * **src/protocols**: Implements three existing ULDP mechanisms (`uRR`, `uRAP`, `uHR`) and three new mechanisms proposed in this work (`uSS`, `uOUE`, `uOLH`). ## Python Files: ### Miscellaneous: * `data_preprocessing.py`: Loads datasets into memory. * `tools_function.py`: Provides two functions to compute `MSE` and `MAE`. ### Theoretical MSE Calculation: * `theoretical_mse.py`: Computes the theoretical MSE of each mechanism according to the paper’s formulas. * `withWithout_z.py`: Computes the theoretical MSE when `z=0`. ### ULDP Mechanisms (in `src/protocols`): * `old_HR.py`: Existing ULDP mechanism `uHR`. * `old_URAP.py`: Existing ULDP mechanism `uRAP`. * `old_uRR.py`: Existing ULDP mechanism `uRR`. * `uOLH.py`: Proposed mechanism `uOLH`. * `uOUE.py`: Proposed mechanism `uOUE`. * `uSS.py`: Proposed mechanism `uSS`. ### Data Cleaning: * `Get_census_dataset.py`: Code to generate the Census dataset. * `Get_Foursquare_dataset.py`: Code to generate the Foursquare dataset. * `Get_normal_dataset.py`: Code to generate the Normal dataset. ### Main Experiment Scripts: * `exp1_impact_of_epsilon.py`: Experiment 1 – Impact of varying `epsilon` on mechanism utility. * `exp1_impact_of_sensitive.py`: Experiment 1 – Impact of varying sensitive data ratio on mechanism utility. * `exp2_collaborative_sampling.py`: Experiment 2 – Impact of collaborative sampling on results. * `exp3_without_z.py`: Experiment 3 – Impact of setting `z=0` on results. * `exp4_mae.py`: Experiment 4 – Results when using `MAE` as the utility metric. ### Experiment Visualization: * `experiment_visualization.py`: Plots experimental results. 1. Manually fill in MSE/MAE values from other experiments. 2. Ensure the output directory exists before running. ## Usage: Run any main experiment script with: ```bash python .py ``` ---------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------- # Frequent Item Mining # Project Structure: ## Folders: * census-data: Contains the real dataset used for the frequent item mining task, including the full data file `census.txt` and the sensitive subset file `census_sen.txt`. * foursquare-data: Contains the real dataset used for the frequent item mining task, including the full data file `foursquare.txt` and the sensitive subset file `foursquare_sen.txt`. * results: Stores the data outputs from experiments. * ULDP: Contains six perturbation mechanism implementations for evaluation, namely `old_HR.py`, `old_URAP.py`, `old_uRR.py`, `uOLH.py`, `uOUE.py`, and `uSS.py`. These scripts are called by the main experiment drivers. ## Scripts: Main experiment drivers (`exp--.py`): * `exp-uhr-cens.py`: Runs the UHR mechanism on the Census dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. * `exp-uhr-fours.py`: Runs the UHR mechanism on the Foursquare dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. * `exp-uolh-cens.py`: Runs the UOLH mechanism on the Census dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. * `exp-uolh-fours.py`: Runs the UOLH mechanism on the Foursquare dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. * `exp-uoue-cens.py`: Runs the UOUE mechanism on the Census dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. * `exp-uoue-fours.py`: Runs the UOUE mechanism on the Foursquare dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. * `exp-urap-cens.py`: Runs the URAP mechanism on the Census dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. * `exp-urap-fours.py`: Runs the URAP mechanism on the Foursquare dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. * `exp-urr-cens.py`: Runs the URR mechanism on the Census dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. * `exp-urr-fours.py`: Runs the URR mechanism on the Foursquare dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. * `exp-uss-cens.py`: Runs the USS mechanism on the Census dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. * `exp-uss-fours.py`: Runs the USS mechanism on the Foursquare dataset for frequent item mining and accuracy evaluation; saves results to the `results` folder. ## Utility modules: * `data.py`: Loads data files and preprocesses datasets. Called by the main experiment scripts. * `F1score.py`: Implements the F1 score evaluation metric. Called by the main experiment scripts. * `NDCG.py`: Implements the NDCG evaluation metric. Called by the main experiment scripts. * `ULDPFIM.py`: Implements the frequent item mining task: generates candidate itemsets from the full dataset and mines frequent items from those candidates. Called by the main experiment scripts. ## Environment: * See environment.yml * Run experiments by executing: ```bash python3 exp--.py ``` For example: ```bash python3 exp-uhr-cens.py ```
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
