
We present the domain name graph (DNG), which is a formal expression that can keep track of CNAME chains and characterize the dynamic and diverse nature of DNS mechanisms and deployments.We develop a framework called Service-Flow map (SFMap) that works on top of the DNG. SFMap estimates the hostname of an HTTPS server when given a pair of client and server IP addresses. It can statistically estimate the hostname even when associating DNS queries are unobserved due to caching mechanisms, etc.Through extensive analysis using real packet traces, we demonstrate that the SFMap framework establishes good estimation accuracies and can out- perform the state-of-the art technique called DN-Hunter. We also identify the optimized setting of the SFMap framework. The experiment results suggest that the success of the SFMap lies in the fact that it can complement incomplete DNS information by leveraging the graph structure.To cope with large-scale measurement data, we introduce techniques to make the SFMap framework scalable. We validate the effectiveness of the approach using large-scale traffic data collected at a gateway point of Internet access links. Adoption of SSL/TLS to protect the privacy of web users has become increasingly common. In fact, as of September 2015, more than 68% of top-1M websites deploy SSL/TLS to encrypt their traffic. The transition from HTTP to HTTPS has brought a new challenge for network operators who need to understand the hostnames of encrypted web traffic for various reasons. To meet the challenge, this work develops a novel framework called SFMap, which estimates names of HTTPS servers by analyzing precedent DNS queries/responses in a statistical way. The SFMap framework introduces domain name graph, which can characterize highly dynamic and diverse nature of DNS mechanisms. Such complexity arises from the recent deployment and implementation of DNS ecosystems; i.e., canonical name tricks used by CDNs, the dynamic and diverse nature of DNS TTL settings, and incomplete and unpredictable measurements due to the existence of various DNS caching instances. First, we demonstrate that SFMap establishes good estimation accuracies and outperforms a state-of-the-art approach. We also aim to identify the optimized setting of the SFMap framework. Next, based on the preliminary analysis, we introduce techniques to make the SFMap framework scalable to large-scale traffic data. We validate the effectiveness of the approach using large-scale Internet traffic.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 8 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
