Algorithm-dependent generalization bounds for multi-task learning

Article English OPEN
Liu, T. ; Tao, D. ; Song, M. ; Maybank, Stephen (2017)

Often, tasks are collected for multi-task learning (MTL) because they share\ud similar feature structures. Based on this observation, in this paper, we present\ud novel algorithm-dependent generalization bounds for MTL by exploiting the notion\ud of algorithmic stability. We focus on the performance of one particular task\ud and the average performance over multiple tasks by analyzing the generalization\ud 1\ud ability of a common parameter that is shared in MTL. When focusing on one\ud particular task, with the help of a mild assumption on the feature structures, we\ud interpret the function of the other tasks as a regularizer that produces a specific\ud inductive bias. The algorithm for learning the common parameter, as well as the\ud predictor, is thereby uniformly stable with respect to the domain of the particular\ud task and has a generalization bound with a fast convergence rate of order O(1=n),\ud where n is the sample size of the particular task. When focusing on the average\ud performance over multiple tasks, we prove that a similar inductive bias exists under\ud certain conditions on the feature structures. Thus, the corresponding algorithm\ud for learning the common parameter is also uniformly stable with respect to the domains\ud of the multiple tasks, and its generalization bound is of the order O(1=T ),\ud where T is the number of tasks. These theoretical analyses naturally show that\ud the similarity of feature structures in MTL will lead to specific regularizations for\ud predicting, which enables the learning algorithms to generalize fast and correctly\ud from a few examples.
  • References (8)

    Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817-1853.

    Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243-272.

    Argyriou, A., Pontil, M., Ying, Y., & Micchelli, C. A. (2007). A spectral regularization framework for multi-task structure learning. In Nips.

    Audiffren, J., & Kadri, H. (2013). Stability of multi-task kernel regression algorithms. In Proceedings of acml (pp. 1-16).

    Bartlett, P., Kulkarni, S., & Posner, S. (1997). Covering numbers for real-valued function classes. IEEE Transactions on Information Theory, 43(5), 1721-1724.

    Bartlett, P. L., & Mendelson, S. (2003). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463-482.

    Baxter, J. (2000). A model of inductive bias learning. Journal of Artificial Intelligence Research, 12(1), 149-198.

    Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47(5), 1902-1914.

  • Metrics
    No metrics available
Share - Bookmark