
doi: 10.25560/90251
handle: 10044/1/90251
This thesis proposes an Inverse Discrete Choice Modelling (IDCM) framework for the enrichment of socio-demographic information for anonymous big datasets with predicable and interpretable enrichment performance. By addressing the research gaps in existing socio-demographic enrichment methods which fail to account for the underpinning microeconomic behaviour theory, the IDCM framework is applicable to and transferable between any enrichment contexts where behaviours of respondents can be obtained, and the socio-demographic information of the data is necessary yet unavailable. Specifically, the IDCM approach postulates that a discrete choice model (DCM) which characterise the dependence between the socio-demographic attributes of people and their behaviour patterns can be inverted to estimate the explanatory socio-demographic information. Correspondingly, the IDCM performance theory expresses the enrichment performance of the IDCM approach as a function of the assumed recalibrated constant. Moreover, the IDCM performance theory establishes the link between the estimated enrichment performance and a developed metric of enrichment efficiency using the assumed re-calibrated constant as a pivot, so as to characterise the variation in its enrichment performance due to changes in the data condition of the enriched sample. Two empirical applications are conducted to validate the ability of the IDCM performance theory to forecast and interpret the IDCM enrichment performance in light of various data conditions. Whereas the IDCM approach performs comparably to logistic regression and support vector machines, the proven ability of the IDCM performance theory in the comparative analysis against the two supervised machine learning methods acknowledges the transferability of the IDCM framework.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
