Cloudy
Spectral Synthesis Code for Astrophysics
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
n_pointer< T, N, ALLOC, lgBC > Class Template Reference

#include <container_classes.h>

Detailed Description

template<class T, int N, mem_layout ALLOC, bool lgBC>
class n_pointer< T, N, ALLOC, lgBC >

The n_pointer and const_n_pointer classes below define indexing into the multi_arr

NB NB – The design of these classes is CRUCIAL for CPU performance! Any redesign should be thoroughly tested for CPU time regressions on a broad range of platforms and compilers!

The primary goal of the design of these classes should be to make life as easy as possible for the compiler, NOT the programmer! The design should assure that that the methods are FULLY INLINED. Without that the resulting code will be severely crippled!

Below are a total of 8 specializations for each of these classes. These could be combined into 2 specialization for each class (one for N > 1 and one for N == 1), but it turns out that the resulting method is too complicated for most compilers to inline. Hence we decided to split the methods up into simpler specializations. As they are written below, nearly every compiler can inline them (at least for the non-bounds_checking versions)...

This is the situation on 2008 jan 26:

Linux IA32/AMD64 ARPA C ARPA/BC C/BC g++ 4.3.x OK OK OK OK (tested on prerelease) g++ 4.2.x OK OK OK OK g++ 4.1.x OK OK OK OK g++ 4.0.x OK OK OK OK g++ 3.4.x ?? OK ?? ?? g++ 3.3.x ?? OK ?? ??

icc 10.1 OK OK OK OK icc 10.0 OK OK OK OK icc 9.1 OK OK OK OK icc 9.0 OK OK OK OK

Sun Studio 12 OK OK OK OK

Pathscale 2.3.1 OK OK – – (only tested on AMD64)

Portland 6.1-5 – – – – (only tested on IA32)

Linux IA64 g++ 4.2.x OK OK OK OK g++ 3.3.x ?? OK? ?? ??

icc 10.1 OK OK – – icc 10.0 OK OK – – icc 9.1 OK OK – – icc 9.0 OK OK OK OK icc 8.1 OK OK OK OK

Windows IA32 Visual Studio 8 OK OK – – icl 10.1 OK OK – –

Solaris Ultrasparc g++ 4.3.x OK OK OK OK (tested on prerelease) g++ 4.2.x OK OK OK OK g++ 4.1.x OK OK OK OK g++ 4.0.x OK OK OK OK g++ 3.4.x ?? ?? ?? ?? g++ 3.3.x ?? ?? ?? ??

Sun Studio 12 OK OK OK OK

Notes:

  • The tests were carried out on the following simple test code:

    long test(multi_arr<long,6>& arr) { return arr[2][3][4][5][6][7]; }

    The tests were done both on ARPA_TYPE and C_TYPE arrays, with and without bounds-checking enabled (BC indicates bounds-checking enabled). The resulting (optimized) assembly was inspected by eye.

  • "OK" indicates that all methods were inlined and the resulting assembly looked optimal; "??" indicates that all methods were inlined, but the assembly looked sub-optimal (i.e. not all overhead was optimized away); "--" indicates that at least some of the methods were not inlined at all.

Indexing a multi_arr works as follows. When the compiler encounters arr[i][j][k], it will first emit a to call multi_arr::operator[] with i as its argument; this operator will construct a temporary n_pointer as follows

arr[i]  -->  n_pointer<T,3>(*p,st[],*v)::operator[](i) -> n_pointer<T,2>

here p is the base pointer to the data, st contains the strides of a C_TYPE array (which are not used in ARPA_TYPE arrays) and v points to the tree_vec with the bounds-checking information. The operator[] of the n_pointer will update p and v using the value of i. This will only take care of the first index. At this stage the compiler still needs to emit code for the remainder: n_pointer<T,2>[j][k]. The operator[] of the n_pointer<T,2> will take care of the next index:

n_pointer<T,2>::operator[](j)  -->  n_pointer<T,1>

So another temporary n_pointer is emitted, and p and v are updated again by the operator[] using j. Now the compiler still needs to emit code for: n_pointer<T,1>[k]; n_pointer<T,1> is special since it will absorb the last index, so it should not return yet another n_pointer but a (const) reference to the data item itself:

n_pointer<T,1>::operator[](k)  -->  T& *(p + k)
const_n_pointer<T,1>::operator[](k)  -->  const T& *(p + k)  

The documentation for this class was generated from the following file: