High Performance Fortran.
http://hpff.rice.edu/index.htm.
O.S. Bagge, K.T. Kalleberg, M. Haveraaen & E. Visser (2003):
Design of the CodeBoost transformation system for domain-specific optimisation of C++ programs.
In: Source Code Analysis and Manipulation, 2003,
pp. 65–74,
doi:10.1109/SCAM.2003.1238032.
Olav Beckmann, Alastair Houghton, Michael Mellor & Paul H.J. Kelly (2004):
Runtime Code Generation in C++ as a Foundation for Domain-Specific Optimisation.
In: Domain-Specific Program Generation,
LNCS 3016.
Springer Berlin / Heidelberg,
pp. 77–210,
doi:10.1007/978-3-540-25935-0_17.
Geoffrey Belter, Elizabeth R. Jessup, Ian Karlin & Jeremy G. Siek (2009):
Automating the generation of composed linear algebra kernels.
In: SC '09,
doi:10.1145/1654059.1654119.
Guy E. Blelloch (1996):
Programming parallel algorithms.
Commun. ACM 39(3),
pp. 85–97,
doi:10.1145/227234.227246.
G. Bradski & M. Muja (2010):
BiGG Detector.
http://www.ros.org/wiki/bigg_detector.
Zoran Budimlic, Michael Burke, Vincent Cavé, Kathleen Knobe, Geoff Lowney, Ryan Newton, Jens Palsberg, David M. Peixotto, Vivek Sarkar, Frank Schlimbach & Sagnak Tasirlar (2010):
Concurrent Collections.
Scientific Programming 18(3-4),
pp. 203–217,
doi:10.3233/SPR-2011-0305.
Cristiano Calcagno, Walid Taha, Liwen Huang & Xavier Leroy (2003):
Implementing Multi-stage Languages Using ASTs, Gensym, and Reflection.
In: GPCE'03,
pp. 57–76.
Jacques Carette, Oleg Kiselyov & Chung chieh Shan (2009):
Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages.
J. Funct. Program. 19,
pp. 509–543,
doi:10.1017/S0956796809007205.
Bryan Catanzaro, Michael Garland & Kurt Keutzer (2011):
Copperhead: compiling an embedded data parallel language.
In: PPoPP'11.
ACM,
New York, NY, USA,
pp. 47–56,
doi:10.1145/1941553.1941562.
H. Chafi, Z. DeVito, A. Moors, T. Rompf, A. K. Sujeeth, P. Hanrahan, M. Odersky & K. Olukotun (2010):
Language Virtualization for Heterogeneous Parallel Computing.
In: Onward!'10,
doi:10.1145/1869459.1869527.
H. Chafi, A. K. Sujeeth, K. J. Brown, H. Lee, A. R. Atreya & K. Olukotun (2011):
A domain-specific approach to heterogeneous parallelism.
In: PPoPP'11,
doi:10.1145/1941553.1941561.
B.L. Chamberlain, D. Callahan & H.P. Zima (2007):
Parallel Programmability and the Chapel Language.
Int. J. High Perform. Comput. Appl. 21(3),
pp. 291–312,
doi:10.1177/1094342007078442.
Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw & Nathan Weizenbaum (2010):
FlumeJava: easy, efficient data-parallel pipelines.
In: PLDI '10.
ACM,
New York, NY, USA,
pp. 363–375,
doi:10.1145/1806596.1806638.
Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun & Vivek Sarkar (2005):
X10: an object-oriented approach to non-uniform cluster computing.
SIGPLAN Not. 40(10),
pp. 519–538,
doi:10.1145/1103845.1094852.
Cliff Click (2011):
Fixing the Inlining Problem.
http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem.
J.W. Cooley & J.W. Tukey (1965):
An algorithm for the machine calculation of complex Fourier series.
Mathematics of computation 19(90),
pp. 297–301,
doi:10.1090/S0025-5718-1965-0178586-1.
Duncan Coutts, Roman Leshchinskiy & Don Stewart (2007):
Stream fusion: from lists to streams to nothing at all.
In: ICFP,
pp. 315–326,
doi:10.1145/1291151.1291199.
Olivier Danvy & Andrzej Filinski (1990):
Abstracting control.
In: Proc. LFP'90,
pp. 151–160,
doi:10.1145/91556.91622.
Olivier Danvy & Andrzej Filinski (1992):
Representing Control: A Study of the CPS Transformation.
Mathematical Structures in Computer Science 2(4),
pp. 361–391,
doi:10.1017/S0960129500001535.
Jeffrey Dean & Sanjay Ghemawat (2004):
MapReduce: Simplified Data Processing on Large Clusters.
In: OSDI'04,
pp. 137–150.
Christopher Earl, Matthew Might & David Van Horn (2010):
Pushdown Control-Flow Analysis of Higher-Order Programs.
CoRR abs/1007.4268.
Matteo Frigo (1999):
A Fast Fourier Transform Compiler.
In: PLDI,
pp. 169–180,
doi:10.1145/301631.301661.
Samuel Z. Guyer & Calvin Lin (1999):
An annotation language for optimizing software libraries.
In: PLAN'99: 2nd conference on Domain-specific languages.
ACM,
New York, NY, USA,
pp. 39–52,
doi:10.1145/331960.331970.
C. Hofer, K. Ostermann, T. Rendel & A. Moors (2008):
Polymorphic embedding of DSLs.
In: GPCE'08,
doi:10.1145/1449913.1449935.
P. Hudak (1996):
Building domain-specific embedded languages.
ACM Computing Surveys 28,
doi:10.1145/242224.242477.
N.D. Jones, C.K. Gomard & P. Sestoft (1993):
Partial evaluation and automatic program generation.
Prentice-Hall.
Simon Peyton Jones, R. Leshchinskiy, G. Keller & M. M. T. Chakravarty (2008):
Harnessing the Multicores: Nested Data Parallelism in Haskell.
In: FSTTCS'08,
pp. 383–414,
doi:10.4230/LIPIcs.FSTTCS.2008.1769.
Guy L. Steele Jr. (2005):
Parallel Programming and Parallel Abstractions in Fortress.
In: IEEE PACT'05,
pp. 157,
doi:10.1109/PACT.2005.34.
Ken Kennedy, Bradley Broom, Arun Chauhan, Rob Fowler, John Garvin, Charles Koelbel, Cheryl McCosh & John Mellor-Crummey (2005):
Telescoping Languages: A System for Automatic Generation of Domain Languages.
Proceedings of the IEEE 93(3),
pp. 387408,
doi:10.1109/JPROC.2004.840447.
Oleg Kiselyov, Kedar N. Swadi & Walid Taha (2004):
A methodology for generating verified combinatorial circuits.
In: EMSOFT,
pp. 249–258,
doi:10.1145/1017753.1017794.
John McCarthy (1963):
A Basis For A Mathematical Theory Of Computation.
In: Computer Programming and Formal Systems.
North-Holland,
pp. 33–70,
doi:10.1016/S0049-237X(08)72018-4.
Chris J. Newburn, Byoungro So, Zhenying Liu, Michael D. McCool, Anwar M. Ghuloum, Stefanus Du Toit, Zhi-Gang Wang, Zhaohui Du, Yongjian Chen, Gansha Wu, Peng Guo, Zhanglin Liu & Dan Zhang (2011):
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language.
In: CGO '11,
pp. 224–235,
doi:10.1109/CGO.2011.5764690.
M. Odersky (2011):
Scala.
http://www.scala-lang.org.
Markus Püschel, José M. F. Moura, Bryan Singer, Jianxin Xiong, Jeremy Johnson, David A. Padua, Manuela M. Veloso & Robert W. Johnson (2004):
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms.
IJHPCA 18(1),
pp. 21–45,
doi:10.1177/1094342004041291.
Tiark Rompf, Ingo Maier & Martin Odersky (2009):
Implementing first-class polymorphic delimited continuations by a type-directed selective CPS-transform.
In: ICFP,
pp. 317–328,
doi:10.1145/1596550.1596596.
Tiark Rompf & Martin Odersky (2010):
Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs.
In: GPCE'10,
doi:10.1145/1868294.1868314.
T. Sheard & S.P. Jones (2002):
Template meta-programming for Haskell.
ACM SIGPLAN Notices 37(12),
pp. 60–75,
doi:10.1145/636517.636528.
O. Sinnen (2007):
Task scheduling for parallel systems.
Wiley series on parallel and distributed computing.
Wiley-Interscience,
doi:10.1002/0470121173.
A. K. Sujeeth, H. Lee, K. J. Brown, T. Rompf, Michael Wu, A. R. Atreya, M. Odersky & K. Olukotun (2011):
OptiML: an Implicitly Parallel Domain-Specific Language for Machine Learning.
In: ICML'11.
Walid Taha (1999):
Multi-Stage Programming: Its Theory and Applications.
Technical Report.
Oregon Graduate Institute School of Science & Engineering.
Walid Taha (2000):
A Sound Reduction Semantics for Untyped CBN Multi-stage Computation. Or, the Theory of MetaML is Non-trivial (Extended Abstract).
In: PEPM,
pp. 34–43,
doi:10.1145/328690.328697.
Walid Taha & Tim Sheard (2000):
MetaML and multi-stage programming with explicit annotations.
Theor. Comput. Sci. 248(1-2),
pp. 211–242,
doi:10.1016/S0304-3975(00)00053-0.
The Khronos Group:
OpenCL 1.0.
http://www.khronos.org/opencl/.
D. Vandevoorde & N.M. Josuttis (2003):
C++ templates: the Complete Guide.
Addison-Wesley Professional.
Dimitrios Vardoulakis & Olin Shivers (2010):
CFA2: A Context-Free Approach to Control-Flow Analysis.
In: ESOP '10,
pp. 570–589,
doi:10.1007/978-3-642-11957-6_30.
Todd L. Veldhuizen (1996):
Expression templates, C++ gems.
SIGS Publications, Inc., New York, NY.
Todd L. Veldhuizen (1998):
Arrays in Blitz++.
In: ISCOPE,
pp. 223–230,
doi:10.1007/3-540-49372-7_24.
Todd L. Veldhuizen (2004):
Active Libraries and Universal Languages.
Indiana University Computer Science.
http://osl.iu.edu/ tveldhui/papers/2004/dissertation.pdf.
Philip Wadler (1990):
Deforestation: Transforming Programs to Eliminate Trees.
Theor. Comput. Sci. 73(2),
pp. 231–248,
doi:10.1016/0304-3975(90)90147-A.
R. Clinton Whaley, Antoine Petitet & Jack Dongarra (2001):
Automated empirical optimizations of software and the ATLAS project.
Parallel Computing 27(1-2),
pp. 3–35,
doi:10.1016/S0167-8191(00)00087-9.