publication . Preprint . Conference object . 2009

RIOT: I/O-Efficient Numerical Computing without SQL

Zhang, Y.; Herodotos Herodotou; Yang, J.;
Open Access English
  • Published: 09 Sep 2009
Abstract
R is a numerical computing environment that is widely popular for statistical data analysis. Like many such environments, R performs poorly for large datasets whose sizes exceed that of physical memory. We present our vision of RIOT (R with I/O Transparency), a system that makes R programs I/O-efficient in a way transparent to the users. We describe our experience with RIOT-DB, an initial prototype that uses a relational database system as a backend. Despite the overhead and inadequacy of generic database systems in handling array data and numerical computation, RIOT-DB significantly outperforms R in many large-data scenarios, thanks to a suite of high-level, in...
Subjects
free text keywords: Computer Science - Databases
18 references, page 1 of 2

[1] P. Baumann, A. Dehmel, P. Furtado, R. Ritsch, and N. Widmann. Spatio-temporal retrieval with RasDaMan. In Proceedings of the 25th Very Large Data Base Endowment Conference, pages 746-749, Edinburgh, Scotland, UK, 1999.

[2] J. M. Chambers. Programming with Data. Springer, New York, 1998.

[3] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1(1): 29-53, 1997.

[4] L. J. Guibas and D. K. Wyatt. Compilation and delayed evaluation in APL. In Proceedings of the 1978 ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pages 1-8, Tucson, Arizona, USA, Jan. 1978.

[5] T. C. Hu and M. T. Shing. Some theorems about matrix multiplication. In 21th Annual Symposium on Foundations of Computer Science, pages 28-35, Washington, DC, USA, 1980.

[6] G.-H. Hwang, J. K. Lee, and R. D.-C. Ju. An array operation synthesis scheme to optimize Fortran 90 programs. In Proceedings of the 1995 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 112-122, Santa Barbara, California, USA, July 1995.

[7] M.-Y. Iu and W. Zwaenepoel. Queryll: Java database queries through bytecode rewriting. In Proceedings of the 2006 IFIP/ACM International Conference on Distributed Systems Platforms, pages 201-218, Melbourne, Australia, Nov. 2006. [OpenAIRE]

[8] P. G. Joisha and P. Banerjee. Static array storage optimization in MATLAB. In Proceedings of the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 258-268, San Diego, California, USA, June 2003.

[9] E. C. Lewis, C. Lin, and L. Snyder. The implementation and evaluation of fusion and contraction in array languages. In Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 50- 59, Montreal, Canada, May 1998.

[10] B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, J. M. Hellerstein, P. Maniatis, R. Ramakrishnan, T. Roscoe, and I. Stoica. Declarative networking: language, execution and optimization. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 97-108, Chicago, IL, USA, 2006.

[11] A. P. Marathe and K. Salem. Query processing techniques for arrays. The International Journal on Very Large Data Bases, 11(1):68-91, 2002.

[12] V. Menon and K. Pingali. High-level semantic optimization of numerical codes. In Proceedings of the 1999 ACM/IEEE Supercomputing Conference, pages 434-443, Rhodes, Greece, June 1999.

[13] J. Nieplocha and I. Foster. Disk resident arrays: An arrayoriented I/O library for out-of-core computations. In Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation, pages 196-204, Washington, DC, USA, 1996.

[14] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 1099-1110, Vancouver, Canada, 2008.

[15] D. J. Rosenkrantz, L. M. R. Mullin, and H. B. H. III. On minimizing materializations of array-valued temporaries. ACM Transactions on Programming Languages and Systems, 28 (6):1145-1177, 2006.

18 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue
publication . Preprint . Conference object . 2009

RIOT: I/O-Efficient Numerical Computing without SQL

Zhang, Y.; Herodotos Herodotou; Yang, J.;