class: titlepage .title[ Introduction to Scientific Programming ] .subtitle[ `IPS-PROD` - N. Dubray - ENSIIE - 2017 ] .row[ `$$\int_{-\infty}^\infty f(x)e^{-x^2} dx \simeq \sum_{i=0}^{n-1}w_if(x_i)$$` ] .footnote[ [:book:](../toc/index.html) ] --- layout: true class: animated fadeIn middle numbers .footnote[ `IPS-PROD` - N. Dubray - ENSIIE - 2017 - [:book:](../toc/index.html) ] --- # These slides .hcenter.w20[] .block.hcenter[ * You can **freely** download, modify and use these slides. * Do not hesitate to make some **pull requests** if you want to share your modifications. * Repository: [http://pedago-etu.ensiie.fr/noel.dubray/lectures.git](http://pedago-etu.ensiie.fr/noel.dubray/lectures.git). ] --- # Lecturer ## N. Dubray - [`noel.dubray@gmail.com`](mailto:noel.dubray@gmail.com) * *job:* researcher at the CEA (french alternative energies and atomic energy commission) * *field:* theoretical nuclear physics * *research subject:* microscopic description of the fission process * *skills:* scientist / developer / HPC user / sysadmin .hcenter.shadow.w50[] .hcenter[Example of a simulated fissioning nucleus] --- # Modules ## :arrow_right: IPS: Introduction to Scientific Programming (S3) * `IPS.DEV`: Scientific application development * `IPS.PROD`: Scientific data production, post-processing and visualization ## :arrow_right: PSA: Advanced Scientific Programming (S4) * `PSA.???`: TBA * `PSA.???`: TBA ## Evaluation * project presentation [~10min] + questions [~10min] ## TPs timeline .mermaid[ graph LR A("Week #1
Initialization") --> B B("Week #2
Project") --> C C("Week #3
Project") --> D D("Week #4
Project") --> E E("Week #5
Project") --> F("Week #6
Presentation") style F fill:#090 ] --- # Data visualization ## 1D/2D plotting tools presented in this course * Python: **`Matplotlib`** * Javascript: **`Plotly.js`** * Standalone: `gnuplot`, **`xmgrace`** ## 3D plotting tools presented in this course * Renderer: **`POV-Ray`** .animated.infinite.flash[:arrow_left:] * Modeler + renderer: **`Blender`** * Data visualization: **`Paraview`** --- # POV-Ray .hcenter.w40[] ## History * very old project (~1980) * maintained by **The POV-Team** * name means "Persistence of Vision Ray Tracer" ## Key points * ray-tracer * Turing-complete scene description language * procedural textures * radiosity .block[ * Repository: [https://github.com/POV-Ray/povray](https://github.com/POV-Ray/povray) * Website: [http://www.povray.org](http://www.povray.org) * License: `AGPLv3` ] --- # Ray-tracing ## What is ray-tracing ? :arrow_right: a method to generate nearly photo-realistic images by modeling light rays. ## Key points * very good reproduction of shadows, reflections, refractions, scattering, etc... * **very high computational cost** * can easily use **parallelization** ## Main types of ray-tracing * eye-based (historical) * light-based * bidirectional path tracing * photon mapping (for caustics, global illumination...) * etc... --- # POV-Ray - General ## Workflow 1. describe the scene in a file 2. let POV-Ray interpret the file and generate an image ## Object types * POV-Ray default objects * user-defined objects ## Object properties * how it deflects / reflects light * texture / material / color * etc... --- # POV-Ray - Example .row[ .column.w48[ ## .hcenter[`[example_00.pov]`] ```pov #include "colors.inc" background { color White } camera { location <0, 2, -3> look_at <0, 1, 2> } sphere { <0, 1, 2>, 2 texture { pigment { color Red } } } light_source { <2, 4, -3> color White} ``` ] .column.w48.middle[ ## .hcenter[Shell session] ```shell $ povray +A0.0001 -W800 -H600 +P +Q11 example_00.pov ``` ] ] .hcenter.w40.border.shadow[] --- # POV-Ray - Source code ## Syntax * **very close from `C`** * case-sensitive * whitespaces are ignored * **order of objects** is ignored (but not properties) * comments use `/* ... */` and `//` like `C` * include other files with `#include "..."` ## Vectors `
` * can refer to a point in 2D-space `
` * can refer to a point in 3D-space `
` * can refer to a color `rgb
` with `\(0\le \{r,g,b\} \le 1\)` * can refer to a transparent color `rgbf
` with `\(0\le \{r,g,b,f\} \le 1\)` * can refer to a normal vector `
` --- # POV-Ray - Initial scene .row[ .column.w48.middle[ ```pov camera { location <1,4,-5> look_at <0,1,0> angle 45 } ``` ] .column.w48[ ## Camera :arrow_right: defines the "eye" of the observer ] ] .row[ .column.w48.middle[ ```pov light_source { <-2,7,-3> color rgb <1,1,1> } ``` ] .column.w48[ ## Light :arrow_right: emits some light ] ] .row[ .column.w48.middle[ ```pov background { rgb <1,1,1> // rgbf <1,1,1,0> } ``` ] .column.w48[ ## Background :arrow_right: defines the default color of rays ] ] --- # POV-Ray - Plane .row[ .column.w40.middle[ ```pov object { plane { <0,1,0> // normal axis 0 // where does the plane cut the axis pigment { checker // checker pattern color rgb <0.7,0.7,0.8> // light gray color rgb <0.5,0.5,0.5> // dark gray scale 2 } } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Sphere .row[ .column.w40.middle[ ```pov object { sphere { <0,1,0>, 1 // center, radius } pigment { color rgb <0,1,0> // pure green } finish { phong 1.0 reflection 0.2 } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Torus .row[ .column.w40.middle[ ```pov object { torus { 0.8, 0.2 // big radius, small radius } pigment { color rgb <0,0,1> // pure blue } finish { phong 1.0 reflection 0.2 } rotate <90,0,0> // rotation translate <0,1,0> // translation } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Box .row[ .column.w40.middle[ ```pov object { box { <-1,0,-1>, <1,2,1> // corners } pigment { color rgb <1,0,0> // pure red } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Cylinder .row[ .column.w40.middle[ ```pov object { cylinder { <0,0,0> // 1st center <0,2,0> // 2nd center 1 // radius } pigment { color rgb <1,1,0> // pure yellow } finish { phong 1.0 reflection 0.2 } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Cone .row[ .column.w40.middle[ ```pov object { cone { <0,0,0> // 1st center 1 // 1st radius <0,2,0> // 2nd center 0.1 // 2nd radius } pigment { color rgb <1,0,1> // pure magenta } finish { phong 1.0 reflection 0.2 } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Lathe .row[ .column.w40.middle[ ```pov object { lathe { cubic_spline // type of spline 6 // number of points <1.0,0> // -+ <1.0,0> // | <1.0,2> // | spline <0.3,2> // | <0.3,0> // | <0.3,0> // -+ } pigment { color rgb <0,1,1> // pure cyan } finish { phong 1.0 reflection 0.2 } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Prism .row[ .column.w40.middle[ ```pov object { prism { linear_spline // type of spline 0, 2 // origin, height 10 // number of points <-1,-1> // -+ <-1,1> // | <1,1> // | outer spline <1,-1> // | <-1,-1> // -+ <-0.8,-0.8> // -+ <-0.8,0.8> // | <0.8,0.8> // | inner spline <0.8,-0.8> // | <-0.8,-0.8> // -+ } pigment { color rgb <1,0,0> // pure red } finish { phong 1.0 reflection 0.2 } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Height field .row[ .column.w40.middle[ ```pov #declare fn_Hills = function { pigment { bumps warp { turbulence 0.5 } scale 0.2 } } object { height_field { function 16, 16 { fn_Hills (x, y, z).red } } pigment { gradient y color_map { [0.0 color rgb <1,1,1>] [0.5 color rgb <1,1,1>] [1.0 color rgb <1,0,0>] } } finish { phong 1.0 reflection 0.2 } translate <-0.5, -0.5, -0.5> scale <2, 0.5, 2> translate <0,1,0> } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Height field .row[ .column.w40.middle[ ```pov #declare fn_Hills = function { pigment { bumps warp { turbulence 0.5 } scale 0.2 } } object { height_field { * function 100, 100 { fn_Hills (x, y, z).red } * smooth } pigment { gradient y color_map { [0.0 color rgb <1,1,1>] [0.5 color rgb <1,1,1>] [1.0 color rgb <1,0,0>] } } finish { phong 1.0 reflection 0.2 } translate <-0.5, -0.5, -0.5> scale <2, 0.5, 2> translate <0,1,0> } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Blob .row[ .column.w40.middle[ ```pov object { blob { threshold 0.6 // isovalue sphere { <.75, 0, 0> // center 1 // radius 1 // strength } sphere { <-.375, .64952, 0>, 1, 1 } sphere { <-.375, -.64952, 0>, 1, 1 } rotate <90,0,0> translate <0,1,0> scale 1.2 } pigment { color rgb <0,1,0> // pure green } finish { phong 1.0 reflection 0.2 } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Text .row[ .column.w40.middle[ ```pov object { text { ttf "timrom.ttf" "ENSIIE" // font 0.3 // thickness 0 // offset translate <-1.6,-0.35,-0.15> scale 1.0 translate <0,1,0> } pigment { agate // pattern type color_map { [0.00 color rgb <1,0,0>] [0.25 color rgb <1,1,0>] [0.50 color rgb <0,1,0>] [0.75 color rgb <0,1,1>] [1.00 color rgb <0,0,1>] } scale 2.0 } finish { phong 1.0 reflection 0.2 } } }``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Transparency .row[ .column.w40.middle[ ```pov object { text { ttf "timrom.ttf" "ENSIIE" // font 0.3 // thickness 0 // offset translate <-1.6,-0.35,-0.15> scale 1.0 translate <0,1,0> } pigment { * rgbf <1,0,0,0.9> } finish { phong 1.0 reflection 0.2 * ambient 0.1 * diffuse 0.5 * specular 0.5 } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Union of objects .row[ .column.w40.middle[ ```pov object { union { box { <-1,0,-1>, <1,2,1> } sphere { <0,2,0>, 1.0 } scale 0.7 } pigment { color rgb <0,1,0> // pure green } finish { phong 1.0 reflection 0.2 } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Intersection of objects .row[ .column.w40.middle[ ```pov object { intersection { box { <-1,0,-1>, <1,2,1> } sphere { <0,2,0>, 1.0 } scale 0.7 } pigment { color rgb <0,0,1> // pure blue } finish { phong 1.0 reflection 0.2 } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Difference of objects .row[ .column.w40.middle[ ```pov object { difference { box { <-1,0,-1>, <1,2,1> } sphere { <0,2,0>, 1.0 } scale 0.7 } pigment { color rgb <1,0,0> // pure red } finish { phong 1.0 reflection 0.2 } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Merge objects .row[ .column.w40.middle[ ```pov object { merge { box { <-1,0,-1>, <1,2,1> } sphere { <0,2,0>, 1.0 } scale 0.7 } pigment { color rgb <1,1,0> // pure yellow } finish { phong 1.0 reflection 0.2 } } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Union / Merge comparison .row[ .column.w48.middle[ ## .hcenter[Union] .hcenter.shadow[] ] .column.w48.middle[ ## .hcenter[Merge] .hcenter.shadow[] ] ] --- # POV-Ray - Density .row[ .column.w40.middle[ ```pov object { box { <-1,-1,-1>, <1,1,1> * hollow } pigment { color rgbf <1,1,1,1> } interior { media { emission 0.75 scattering {1, 0.5} * density { spherical color_map { [0.0 rgb <0,0,0.5>] [0.5 rgb <0.8, 0.8, 0.4>] [1.0 rgb <1,1,1>] } } } } translate <0,1,0> * translate <0,0.001,0> } ``` ] .column.w55.middle[ .hcenter.shadow[] ] ] --- # POV-Ray - Nuclear Local Density .hcenter.shadow.w80[] --- # POV-Ray - `visu.pov` :arrow_right: file available [here](../tools/povray/visu.pov). This scene uses a density file called `example.df3`. ```pov [...] box { <0,0,0><1,1,1> hollow pigment { rgbf 1 } interior { media { emission 0.7 absorption 1.0 intervals 10 density { * density_file df3 "example.df3" scale
interpolate INTERPOLATE color_map { [0.0 rgb <0.0,0.0,0.0>] [0.1 rgb <0.0,0.0,2.0>] [1.0 rgb <1.0,1.0,1.0>] } } } } scale <10,10,20> translate <-5,-5,-10> rotate y*90 } [...] ``` --- # POV-Ray - `df3` format ## Reference [POV-Ray df3 format](http://povray.org/documentation/3.7.0/r3_4.html#r3_4_7_1_8_1). .block[ ## Header The df3 format consists of a 6 byte header of three 16-bit integers with high order byte first. These three values give the x,y,z size of the data in pixels (or more appropriately called voxels). ## Data The header is followed by x*y*z unsigned integer bytes of data with a resolution of 8, 16 or 32 bit. The data are written with high order byte first (big-endian). The resolution of the data is determined by the size of the df3-file. That is, if the file is twice (minus header, of course) as long as an 8 bit file then it is assumed to contain 16 bit ints and if it is four times as long 32 bit ints. ] --- # POV-Ray - `df3` writer ```C++ std::string cubeToDf3(const arma::cube &m) { std::stringstream ss(std::stringstream::out | std::stringstream::binary); int nx = m.n_rows; int ny = m.n_cols; int nz = m.n_slices; ss.put(nx >> 8); ss.put(nx & 0xff); ss.put(ny >> 8); ss.put(ny & 0xff); ss.put(nz >> 8); ss.put(nz & 0xff); double theMin = 0.0; double theMax = m.max(); for (uint k = 0; k < m.n_slices; k++) { for (uint j = 0; j < m.n_cols; j++) { for (uint i = 0; i < m.n_rows; i++) { uint v = 255 * (fabs(m(i, j, k)) - theMin) / (theMax - theMin); ss.put(v); } } } return ss.str(); } ``` where `m` is an `arma::cube` representing the density to be plotted. **Mandatory dimensions for `m`**: .hcenter[ | Axis | x | y | z | | :--: | :----: | :----: | :----: | | min | -10 fm | -10 fm | -20 fm | | max | 10 fm | 10 fm | 20 fm | | nbp | 32 | 32 | 64 | ] --- # POV-Ray - Conclusions .hcenter.shadow.w60[] :+: rather powerful ray-tracer :+: very easy to generate scenes :+: very easy to automate mass-production of renderings :+: good introduction to ray-tracers :warning: not a modeler :warning: may seem old-school at some points :arrow_right: **My advice:** use it as your first ray-tracer, then switch to **Blender**, especially if you need a modeler. --- # Data visualization ## 1D/2D plotting tools presented in this course * Python: **`Matplotlib`** * Javascript: **`Plotly.js`** * Standalone: `gnuplot`, **`xmgrace`** ## 3D plotting tools presented in this course * Renderer: **`POV-Ray`** :+: * Modeler + renderer: **`Blender`** .animated.infinite.flash[:arrow_left:] * Data visualization: **`Paraview`** --- # Blender - `raw` writer ```C++ std::string cubeToRaw(const arma::cube &m) { std::stringstream ss(std::stringstream::out | std::stringstream::binary); double theMin = 0.0; double theMax = m.max(); for (uint k = 0; k < m.n_slices; k++) { for (uint j = 0; j < m.n_cols; j++) { for (uint i = 0; i < m.n_rows; i++) { uint v = 255 * (fabs(m(i, j, k)) - theMin) / (theMax - theMin); ss.put(v); } } } return ss.str(); } ``` where `m` is an `arma::cube` representing the density to be plotted. **Mandatory dimensions for `m`**: .hcenter[ | Axis | x | y | z | | :--: | :----: | :----: | :----: | | min | -10 fm | -10 fm | -20 fm | | max | 10 fm | 10 fm | 20 fm | | nbp | 32 | 32 | 64 | ] --- # Scientific programming ## Scientific programming tools presented in this course * linear algebra libraries (usable from `C++`) * `Boost`, `LAPACK`, `BLAS`, `MKL`, `GSL`, `Eigen`, **`Armadillo`** :+: * scientific programming with Python * **introduction to Python** :+: * **`NumPy`** * presenting results * write beautiful math: ** .latex[L
a
T
e
X] ** :+: * structure the documentation: **Markdown** :+: * write some slides: **remark.js** :+: ## Scientific programming techniques presented in this course * numerical integration * **Gaussian quadrature** :+: * optimization of sum calculation .animated.infinite.flash[:arrow_left:] --- # Optimization of sum calculation - Sum types ## 1. Simple sum `$$ V \equiv \sum_{i_1}^{I_1}\ldots \sum_{i_N}^{I_N} \prod_{j=1}^{M} F_j[i_1,\ldots,i_N] $$` :arrow_right: By convention, the `\(F_j[\ldots]\)` values are pre-calculated and stored: `$$ \forall j \in [1..\,M], F_j[i_1,\ldots,i_N] \equiv F(x_{1,j}[i_1], \ldots, x_{N,j}[i_N]) $$` and `$$ \forall i \in [1..\,N], \forall j \in [1..\,M], x_{i,j} \equiv \{x_{i,j}[1],\ldots,x_{i,j}[I_i]\}. $$` ## Example `$$ V \equiv \sum_{i=0}^{100}\sum_{j=0}^{100} \frac{x[i]+y[j]}{\sqrt{1 + x[i] + y[j]}} $$` --- # Optimization of sum calculation - Sum types ## 2. "Multiple" sum `$$ \forall k_1 \in [1..\,K_1],\ldots, \forall k_L \in [1..\,K_L], V[k_1,\ldots,k_L] \equiv \sum_{i_1}^{I_1}\ldots \sum_{i_N}^{I_N} \prod_{j=1}^{M} F_j[i_1,\ldots,i_N, k_1,\ldots, k_L]. $$` Using the notation `$$ \coprod_{k}^{K} \equiv \left(\forall k \in [1..\,K]\right), $$` one can re-write the previous expression as `$$ \coprod_{k_1}^{K_1}\ldots \coprod_{k_L}^{K_L} V[k_1,\ldots,k_L] \equiv \sum_{i_1}^{I_1}\ldots \sum_{i_N}^{I_N} \prod_{j=1}^{M} F_j[i_1,\ldots,i_N, k_1,\ldots, k_L]. $$` ## Example `$$ \coprod_i^{10}\coprod_j^{10} V[i,j] \equiv \sum_{k=0}^{100}\sum_{l=0}^{100} \frac{x[i]+y[j]}{\sqrt{1 + x[k] + y[l]}} $$` --- # Optimization of sum calculation - Sum types ## 3. "Multiple", "variable range" sum `$$ \coprod_{k_1}^{K_1}\ldots \coprod_{k_L}^{K_L[k_1,\ldots,k_{L-1}]} V[k_1,\ldots,k_L] \equiv \sum_{i_1}^{I_1[k_1,\ldots,k_L]}\ldots \sum_{i_N}^{I_N[k_1,\ldots,k_L,i_1,\ldots,i_{N-1}]} \prod_{j=1}^{M} F_j[i_1,\ldots,i_N, k_1,\ldots, k_L]. $$` ## Example `$$ \coprod_i^{10}\coprod_j^{i + 10} V[i,j] \equiv \sum_{k=0}^{i + 1}\sum_{l=0}^{j + k + 1} \frac{x[i]+y[j]}{\sqrt{1 + x[k] + y[l]}} $$` --- # Optimization of sum calculation - Sum types ## 4. "Multiple", "variable-range", "block-structured" sum `$$ \coprod_{k_1}^{K_1}\ldots \coprod_{k_L}^{K_L[k_1,\ldots,k_{L-1}]} V[k_1,\ldots,k_L] \equiv \sum_{i_1}^{I_1[k_1,\ldots,k_L]}\ldots \sum_{i_N}^{I_N[k_1,\ldots,k_L,i_1,\ldots,i_{N-1}]} \prod_{j=1}^{M} F_j[i_1,\ldots,i_N, k_1,\ldots, k_L] $$` with `$$ \prod_{j=1}^M F_j[i_1,\ldots,i_N, k_1,\ldots, k_L] = G[i_1,\ldots,i_N, k_1,\ldots, k_L] \left(\prod_{l=1}^{L} \delta_l(i_1,\ldots,i_N, k_1,\ldots, k_L)\right). $$` ## Example `$$ \begin{eqnarray} \coprod_i^{10}\coprod_j^{i + 10} V[i,j] &\equiv& \sum_{k=0}^{i + 1}\sum_{l=0}^{j + k + 1} \frac{(1-\delta_{lk})\left(x[i]+y[j]\right)}{\sqrt{1 + x[k] + y[l]}}\\ &=& \sum_{k=0}^{i + 1}\sum_{\substack{l=0\\l\ne k}}^{j + k + 1} \frac{x[i]+y[j]}{\sqrt{1 + x[k] + y[l]}} \end{eqnarray} $$` --- # Optimization of sum calculation - Notations * we sum up all `\(\sum\)` and `\(\coprod\)` dependencies in the upper symbol list * we do the same for all functions ## Example `$$ \begin{eqnarray} \sum_{i=F(a)}^{G(b,c)} &\rightarrow& \sum_i^{(abc)}\\ \coprod_{j=H(d)}^{J(d,f)} &\rightarrow& \coprod_j^{(df)}\\ F[a,c,d] &\rightarrow& F^{(acd)} \end{eqnarray} $$` --- # Optimization of sum calculation - Example `$$ \coprod_a^A\coprod_b^B V[a,b]\equiv \sum_c^C\sum_d^{D(c)}\sum_e^E F[a,b] G[d,e] \delta_{ae} $$` ## Raw implementation * `\(\sum\)` and `\(\coprod\)` are replaced by `for` loops * `\(\delta_{xy}\)` are replaced by `if` tests ```C++ //=== Version 0 === for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { double sum = 0.0; for (uint c = 0; c < C; c++) { for (uint d = 0; d < D[c]; d++) { for (uint e = 0; e < E; e++) { if (a != e) continue; sum += F(a, b) * G(d, e); } // e } // d } // c V(a, b) = sum; } // b } // a ``` --- # Optimization of sum calculation - Remove deltas ## Use `\(\delta_{xy}\)` to remove loops `$$ \coprod_a^A\coprod_b^B V[a,b]\equiv \sum_c^C\sum_d^{D(c)}\color{magenta}{\sum_e^E} F[a,b] G[d,e] \color{magenta}{\delta_{ae}} $$` `$$ \iff \coprod_a\coprod_b V^{(ab)}= \sum_c\sum_d^{(c)} F^{(ab)} G^{(ad)} $$` .row[ .column.w48.middle[ ```C++ //=== Version 0 === for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { double sum = 0.0; for (uint c = 0; c < C; c++) { for (uint d = 0; d < D[c]; d++) { * for (uint e = 0; e < E; e++) { * if (a != e) continue; sum += F(a, b) * G(d, e); } // e } // d } // c V(a, b) = sum; } // b } // a ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ //=== Version 1 === for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { double sum = 0.0; for (uint c = 0; c < C; c++) { for (uint d = 0; d < D(c); d++) { sum += F(a, b) * G(d, a); } // d } // c V(a, b) = sum; } // b } // a ``` ] ] --- # Optimization of sum calculation - Loop order :arrow_right: Move loops **as deep (right) as possible** (except in this case, see later) `$$ \coprod_a\coprod_b V^{(ab)} = \color{magenta}{\sum_c}\sum_d^{(c)} F^{(ab)} G^{(ad)} $$` `$$ \iff \color{magenta}{V=0}, \color{magenta}{\coprod_c}\coprod_a\coprod_b V^{(ab)} \color{magenta}{\mathrel{+}=} \sum_d^{(c)} F^{(ab)} G^{(ad)} $$` .row[ .column.w48.middle[ ```C++ //=== Version 1 === for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { double sum = 0.0; * for (uint c = 0; c < C; c++) { for (uint d = 0; d < D(c); d++) { sum += F(a, b) * G(d, a); } // d } // c V(a, b) = sum; } // b } // a ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ //=== Version 2 === *for (uint a = 0; a < A; a++) * for (uint b = 0; b < B; b++) * V(a, b) = 0.0; *for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { for (uint d = 0; d < D(c); d++) { * V(a, b) += F(a, b) * G(d, a); } // d } // b } // a } // c ``` ] ] --- # Optimization of sum calculation - Function order :arrow_right: Move functions **as high (left) as possible** `$$ V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= \sum_d^{(c)} \color{magenta}{F^{(ab)}} G^{(ad)} $$` `$$ \iff V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= \color{magenta}{F^{(ab)}} \sum_d^{(c)} G^{(ad)} $$` .row[ .column.w48.middle[ ```C++ //=== Version 2 === for (uint a = 0; a < A; a++) for (uint b = 0; b < B; b++) V(a, b) = 0.0; for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { * for (uint d = 0; d < D(c); d++) * { * V(a, b) += F(a, b) * G(d, a); * } // d } // b } // a } // c ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ //=== Version 3 === for (uint a = 0; a < A; a++) for (uint b = 0; b < B; b++) V(a, b) = 0.0; for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { * double sum = 0.0; * for (uint d = 0; d < D(c); d++) * sum += G(d, a); * V(a, b) += sum * F(a, b); } // b } // a } // c ``` ] ] --- # Optimization of sum calculation - Pre-sum :arrow_right: Identify pre-summable quantities `$$ V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} \underbrace{\color{magenta}{\sum_d^{(c)} G^{(ad)}}}_{H^{(ac)}} $$` `$$ \iff V=0, \coprod_c\coprod_a\left[\color{magenta}{H^{(ac)}=\sum_d^{(c)} G^{(ad)}}\right]\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} \color{magenta}{H^{(ac)}} $$` .row[ .column.w48.middle[ ```C++ //=== Version 3 === for (uint a = 0; a < A; a++) for (uint b = 0; b < B; b++) V(a, b) = 0.0; for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { * for (uint b = 0; b < B; b++) * { * double sum = 0.0; * for (uint d = 0; d < D(c); d++) * sum += G(d, a); * V(a, b) += sum * F(a, b); * } // b } // a } // c ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ //=== Version 4 === for (uint a = 0; a < A; a++) for (uint b = 0; b < B; b++) V(a, b) = 0.0; for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { * double Hac = 0.0; * for (uint d = 0; d < D[c]; d++) * Hac += G(d, a); * for (uint b = 0; b < B; b++) * V(a, b) += F(a, b) * Hac; } // a } // c ``` ] ] --- # Optimization of sum calculation - Vectorization :arrow_right: Identify vector, matrix and cube operations `$$ \color{magenta}{V=0}, \coprod_c\coprod_a\left[\color{magenta}{H^{(ac)}=\sum_d^{(c)} G^{(ad)}}\right]\color{magenta}{\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} H^{(ac)}} $$` .row[ .column.w48.middle[ ```C++ //=== Version 4 === *for (uint a = 0; a < A; a++) * for (uint b = 0; b < B; b++) * V(a, b) = 0.0; for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { * double Hac = 0.0; * * for (uint d = 0; d < D[c]; d++) * Hac += G(d, a); * for (uint b = 0; b < B; b++) * V(a, b) += F(a, b) * Hac; } // a } // c ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ //=== Version 5 === *V = arma::zeros(A, B); for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { * double Hac = * arma::sum(G.col(a).rows(0, D[c] - 1)); * V.row(a) += F.row(a) * Hac; } // a } // c ``` ]] --- # Optimization of sum calculation - Speed-up `$$ \coprod_a^A\coprod_b^B V[a,b]\equiv \sum_c^C\sum_d^{D(c)}\sum_e^E F[a,b] G[d,e] \delta_{ae} $$` ## Initialization ```C++ //=== Common initialization === uint A = 109; uint B = 119; uint C = 129; arma::ivec D(C); for (uint c = 0; c < C; c++) D(c) = c + 1; uint E = A; arma::mat F; F.randu(A, B); arma::mat G; G.randu(D.max(), E); arma::mat V; ``` --- # Optimization of sum calculation - Initial execution time `$$ \coprod_a^A\coprod_b^B V[a,b]\equiv \sum_c^C\sum_d^{D(c)}\sum_e^E F[a,b] G[d,e] \delta_{ae} $$` ## Raw implementation * `\(\sum\)` and `\(\coprod\)` are replaced by `for` loops * `\(\delta_{xy}\)` are replaced by `if` tests ```C++ //=== Version 0 === for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { double sum = 0.0; for (uint c = 0; c < C; c++) { for (uint d = 0; d < D[c]; d++) { for (uint e = 0; e < E; e++) { if (a != e) continue; sum += F(a, b) * G(d, e); } // e } // d } // c V(a, b) = sum; } // b } // a ``` .alert.hcenter[**4.7808s**] --- # Optimization of sum calculation - Remove deltas ## Use `\(\delta_{xy}\)` to remove loops `$$ \coprod_a^A\coprod_b^B V[a,b]\equiv \sum_c^C\sum_d^{D(c)}\color{magenta}{\sum_e^E} F[a,b] G[d,e] \color{magenta}{\delta_{ae}} $$` `$$ \iff \coprod_a\coprod_b V^{(ab)}= \sum_c\sum_d^{(c)} F^{(ab)} G^{(ad)} $$` .row[ .column.w48.middle[ ```C++ //=== Version 0 === for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { double sum = 0.0; for (uint c = 0; c < C; c++) { for (uint d = 0; d < D[c]; d++) { * for (uint e = 0; e < E; e++) { * if (a != e) continue; sum += F(a, b) * G(d, e); } // e } // d } // c V(a, b) = sum; } // b } // a ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ //=== Version 1 === for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { double sum = 0.0; for (uint c = 0; c < C; c++) { for (uint d = 0; d < D(c); d++) { sum += F(a, b) * G(d, a); } // d } // c V(a, b) = sum; } // b } // a ``` ] ] .row[ .column.w48.middle[ .alert.hcenter[**4.7808s**] ] .column.w48.middle[ .alert.hcenter[**0.6951s** (speed-up 6.88)] ] ] --- # Optimization of sum calculation - Loop and function order :arrow_right: Move loops **as deep (right) as possible** (except in this case, see later) `$$ \coprod_a\coprod_b V^{(ab)} = \color{magenta}{\sum_c}\sum_d^{(c)} F^{(ab)} G^{(ad)} $$` `$$ \iff \color{magenta}{V=0}, \color{magenta}{\coprod_c}\coprod_a\coprod_b V^{(ab)} \color{magenta}{\mathrel{+}=} \sum_d^{(c)} F^{(ab)} G^{(ad)} $$` .row[ .column.w48.middle[ ```C++ //=== Version 1 === for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { double sum = 0.0; * for (uint c = 0; c < C; c++) { for (uint d = 0; d < D(c); d++) { sum += F(a, b) * G(d, a); } // d } // c V(a, b) = sum; } // b } // a ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ //=== Version 2 === *for (uint a = 0; a < A; a++) * for (uint b = 0; b < B; b++) * V(a, b) = 0.0; *for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { for (uint d = 0; d < D(c); d++) { * V(a, b) += F(a, b) * G(d, a); } // d } // b } // a } // c ``` ] ] .row[ .column.w48.middle[ .alert.hcenter[**0.6951s**] ] .column.w48.middle[ .alert.hcenter[**0.4261s** (speed-up 1.63)] ] ] --- # Optimization of sum calculation - Loop and function order :arrow_right: Move functions **as high (left) as possible** `$$ V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= \sum_d^{(c)} \color{magenta}{F^{(ab)}} G^{(ad)} $$` `$$ \iff V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= \color{magenta}{F^{(ab)}} \sum_d^{(c)} G^{(ad)} $$` .row[ .column.w48.middle[ ```C++ //=== Version 2 === for (uint a = 0; a < A; a++) for (uint b = 0; b < B; b++) V(a, b) = 0.0; for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { * for (uint d = 0; d < D(c); d++) * { * V(a, b) += F(a, b) * G(d, a); * } // d } // b } // a } // c ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ //=== Version 3 === for (uint a = 0; a < A; a++) for (uint b = 0; b < B; b++) V(a, b) = 0.0; for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { * double sum = 0.0; * for (uint d = 0; d < D(c); d++) * sum += G(d, a); * V(a, b) += sum * F(a, b); } // b } // a } // c ``` ] ] .row[ .column.w48.middle[ .alert.hcenter[**0.4261s**] ] .column.w48.middle[ .alert.hcenter[**0.1447s** (speed-up 2.94)] ] ] --- # Optimization of sum calculation - Pre-sum :arrow_right: Identify pre-summable quantities `$$ V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} \underbrace{\color{magenta}{\sum_d^{(c)} G^{(ad)}}}_{H^{(ac)}} $$` `$$ \iff V=0, \coprod_c\coprod_a\left[\color{magenta}{H^{(ac)}=\sum_d^{(c)} G^{(ad)}}\right]\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} \color{magenta}{H^{(ac)}} $$` .row[ .column.w48.middle[ ```C++ //=== Version 3 === for (uint a = 0; a < A; a++) for (uint b = 0; b < B; b++) V(a, b) = 0.0; for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { * for (uint b = 0; b < B; b++) * { * double sum = 0.0; * for (uint d = 0; d < D(c); d++) * sum += G(d, a); * V(a, b) += sum * F(a, b); * } // b } // a } // c ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ //=== Version 4 === for (uint a = 0; a < A; a++) for (uint b = 0; b < B; b++) V(a, b) = 0.0; for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { * double Hac = 0.0; * for (uint d = 0; d < D[c]; d++) * Hac += G(d, a); * for (uint b = 0; b < B; b++) * V(a, b) += F(a, b) * Hac; } // a } // c ``` ] ] .row[ .column.w48.middle[ .alert.hcenter[**0.1447s**] ] .column.w48.middle[ .alert.hcenter[**0.0027s** (speed-up 53.59)] ] ] --- # Optimization of sum calculation - Vectorization :arrow_right: Identify vector, matrix and cube operations `$$ \color{magenta}{V=0}, \coprod_c\coprod_a\left[\color{magenta}{H^{(ac)}=\sum_d^{(c)} G^{(ad)}}\right]\color{magenta}{\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} H^{(ac)}} $$` .row[ .column.w48.middle[ ```C++ //=== Version 4 === *for (uint a = 0; a < A; a++) * for (uint b = 0; b < B; b++) * V(a, b) = 0.0; for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { * double Hac = 0.0; * * for (uint d = 0; d < D[c]; d++) * Hac += G(d, a); * for (uint b = 0; b < B; b++) * V(a, b) += F(a, b) * Hac; } // a } // c ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ //=== Version 5 === *V = arma::zeros(A, B); for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { * double Hac = * arma::sum(G.col(a).rows(0, D[c] - 1)); * V.row(a) += F.row(a) * Hac; } // a } // c ``` ]] .row[ .column.w48.middle[ .alert.hcenter[**0.0027s**] ] .column.w48.middle[ .alert.hcenter[**0.0019s** (speed-up 1.42)] ] ] --- # Optimization of sum calculation - Overall speed-up :arrow_right: Best speed-ups are achieved when **reducing the height of the loop stack**. :arrow_right: There is an important **readability / performance trade-off**. .row[ .column.w48.middle[ ```C++ //=== Version 0 === for (uint a = 0; a < A; a++) { for (uint b = 0; b < B; b++) { double sum = 0.0; for (uint c = 0; c < C; c++) { for (uint d = 0; d < D[c]; d++) { for (uint e = 0; e < E; e++) { if (a != e) continue; sum += F(a, b) * G(d, e); } // e } // d } // c V(a, b) = sum; } // b } // a ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ //=== Version 5 === V = arma::zeros(A, B); for (uint c = 0; c < C; c++) { for (uint a = 0; a < A; a++) { double Hac = arma::sum(G.col(a).rows(0, D[c] - 1)); V.row(a) += F.row(a) * Hac; } // a } // c ``` ] ] .row[ .column.w48.middle[ .alert.hcenter[**4.7808s**] ] .column.w48.middle[ .alert.hcenter[**0.0019s** (speed-up 2516.21)] ] ] --- # Optimization of sum calculation - Other techniques ## Linearization :arrow_right: Transform multiple simple loops into a single loop. :arrow_right: Use a **simple formula** to calculate the indices. .row[ .column.w48.middle[ ```C++ double sum = 0.0; for (uint a = 0; a < A; a++) for (uint b = 0; b < B; b++) { sum += F(a, b); } ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ double sum = 0.0; for (uint i = 0; i < A * B; i++) { uint a = i / B; uint b = i - a * B; sum += F(a, b); } ``` ] ] --- # Optimization of sum calculation - Other techniques ## Two-pass linearization .vspace[] .block.hcenter[ .hcenter[When a simple formula to calculate the indices does not exist,] .hcenter[:arrow_right: **store the indices**.] ] .vspace[] .row[ .column.w48.middle[ ```C++ double sum = 0.0; for (uint a = A0; a < A1; a++) for (uint b = B0(a); b < B1(a); b++) for (uint c = C0(a, b); c < C1(a, b); c++) for (uint d = D0(a, b, c); d < D1(a, b, c); d++) { sum += F(a, b, c) * G(b, c, d); } ``` ] .column.middle[ :arrow_right: ] .column.w48.middle[ ```C++ arma::imat ind(4,0); for (uint a = A0; a < A1; a++) for (uint b = B0(a); b < B1(a); b++) for (uint c = C0(a, b); c < C1(a, b); c++) for (uint d = D0(a, b, c); d < D1(a, b, c); d++) { arma::ivec v = {a, b, c, d}; ind.insert_cols(ind.n_cols, v); } double sum = 0.0; for (uint i = 0; i < ind.n_cols; i++) sum += F(ind(0, i), ind(1, i), ind(2, i)) * G(ind(1, i), ind(2, i), ind(3, i)); ``` ] ] ## Three-pass :arrow_right: The same with a fixed size for `ind`, to avoid the costly `.insert_cols`. ## :warning: Caution with multi-pass techniques :warning: This can be useful for the **small**, **deep** loops only. No need to linearize all loops. :arrow_right: To be efficient, it must **factorize some calculations**. --- # Optimization of sum calculation - Conclusions .noflex[ ## Remember these points: * push the heaviest loops as deep as possible * factorize functions when possible * **pre-sum** everything ! (if possible) * **vectorize** ! (Armadillo, special `asm` instructions, etc...) * be smart with the indices (pick-up table, linearization, two-pass, etc...) :arrow_right: One can do **much better** than the `-O3` flag of the compiler. .rightf.shadow.w35[  ] ## To go further... * check the data structure alignment * explore hardware cache levels optimization * look up "cache invalidation" * use multi-threading * use multi-processing * use GPUs * use `asm` * ... ] --- # Scientific programming ## Scientific programming tools presented in this course * linear algebra libraries (usable from `C++`) * `Boost`, `LAPACK`, `BLAS`, `MKL`, `GSL`, `Eigen`, **`Armadillo`** :+: * scientific programming with Python * **introduction to Python** :+: * **`NumPy`** * presenting results * write beautiful math: ** .latex[L
a
T
e
X] ** :+: * structure the documentation: **Markdown** :+: * write some slides: **remark.js** :+: ## Scientific programming techniques presented in this course * numerical integration * **Gaussian quadrature** :+: * optimization of sum calculation :+: --- # Valgrind ## Key points * `Valgrind` is a **virtual machine** * it uses JIT compilation techniques * it allows to easily instrument binary code (with or without debug symbols) * there is no need to source access nor recompilation * it provides different tools for binary code profiling * it is language-agnostic * it has a **huge impact** on execution times (x20) ## Tools provided * `Memcheck`: default tool, checks validity, adressability, memory leaks... * `None`: get the traceback of a segfault * `Helrind`: detects race conditions in multithreaded code * `Callgrind`: logs and profiles the function call tree * `Cachegrind`: profiles cache use * `exp-sgcheck`, `exp-dhat`, `exp-bbv`, etc... .block[ * Repository: [https://sourceware.org/git/?p=valgrind.git](https://sourceware.org/git/?p=valgrind.git) * Website: [http://valgrind.org](http://valgrind.org) * License: `GPLv2` ] --- # Valgrind - Memcheck example ## .hcenter[test.c] ```C #include "stdlib.h" #include "stdio.h" #define SIZE 3 int main(int argc, char **argv) { int* table = (int*)malloc(sizeof(int) * SIZE); for (int i = 0; i < SIZE; i++) { table[i] = i; } for (int i = 0; i <= SIZE; i++) { printf("table[%d]=%d\n", i, table[i]); } return 0; } ``` ## .hcenter[Shell session] ```shell $ gcc test.c -o test $ ./test table[0]=0 table[1]=1 table[2]=2 table[3]=0 ``` --- # Valgrind - Memcheck example ## .hcenter[Shell session] ```shell $ valgrind ./test ==6434== Memcheck, a memory error detector ==6434== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6434== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==6434== Command: ./test ==6434== *table[0]=0 *table[1]=1 *table[2]=2 ==6434== Invalid read of size 4 ==6434== at 0x108756: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test) ==6434== Address 0x520104c is 0 bytes after a block of size 12 alloc'd ==6434== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6434== by 0x108708: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test) ==6434== *table[3]=0 ==6434== ==6434== HEAP SUMMARY: ==6434== in use at exit: 12 bytes in 1 blocks ==6434== total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated ==6434== ==6434== LEAK SUMMARY: ==6434== definitely lost: 12 bytes in 1 blocks ==6434== indirectly lost: 0 bytes in 0 blocks ==6434== possibly lost: 0 bytes in 0 blocks ==6434== still reachable: 0 bytes in 0 blocks ==6434== suppressed: 0 bytes in 0 blocks ==6434== Rerun with --leak-check=full to see details of leaked memory ==6434== ==6434== For counts of detected and suppressed errors, rerun with: -v ==6434== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ``` --- # Valgrind - Memcheck example ## .hcenter[Shell session] ```shell $ valgrind ./test ==6434== Memcheck, a memory error detector ==6434== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6434== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==6434== Command: ./test ==6434== table[0]=0 table[1]=1 table[2]=2 ==6434== Invalid read of size 4 ==6434== at 0x108756: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test) ==6434== Address 0x520104c is 0 bytes after a block of size 12 alloc'd ==6434== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6434== by 0x108708: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test) ==6434== table[3]=0 ==6434== ==6434== HEAP SUMMARY: ==6434== in use at exit: 12 bytes in 1 blocks ==6434== total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated ==6434== ==6434== LEAK SUMMARY: ==6434== definitely lost: 12 bytes in 1 blocks ==6434== indirectly lost: 0 bytes in 0 blocks ==6434== possibly lost: 0 bytes in 0 blocks ==6434== still reachable: 0 bytes in 0 blocks ==6434== suppressed: 0 bytes in 0 blocks ==6434== Rerun with --leak-check=full to see details of leaked memory ==6434== ==6434== For counts of detected and suppressed errors, rerun with: -v *==6434== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ``` --- # Valgrind - Memcheck example ## .hcenter[Shell session] ```shell $ valgrind ./test ==6434== Memcheck, a memory error detector ==6434== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6434== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==6434== Command: ./test ==6434== table[0]=0 table[1]=1 table[2]=2 *==6434== Invalid read of size 4 *==6434== at 0x108756: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test) *==6434== Address 0x520104c is 0 bytes after a block of size 12 alloc'd *==6434== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) *==6434== by 0x108708: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test) ==6434== table[3]=0 ==6434== ==6434== HEAP SUMMARY: ==6434== in use at exit: 12 bytes in 1 blocks ==6434== total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated ==6434== ==6434== LEAK SUMMARY: ==6434== definitely lost: 12 bytes in 1 blocks ==6434== indirectly lost: 0 bytes in 0 blocks ==6434== possibly lost: 0 bytes in 0 blocks ==6434== still reachable: 0 bytes in 0 blocks ==6434== suppressed: 0 bytes in 0 blocks ==6434== Rerun with --leak-check=full to see details of leaked memory ==6434== ==6434== For counts of detected and suppressed errors, rerun with: -v ==6434== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ``` --- # Valgrind - Memcheck example ## .hcenter[Shell session] ```shell $ valgrind ./test ==6434== Memcheck, a memory error detector ==6434== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6434== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==6434== Command: ./test ==6434== table[0]=0 table[1]=1 table[2]=2 ==6434== Invalid read of size 4 ==6434== at 0x108756: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test) ==6434== Address 0x520104c is 0 bytes after a block of size 12 alloc'd ==6434== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6434== by 0x108708: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test) ==6434== table[3]=0 ==6434== ==6434== HEAP SUMMARY: *==6434== in use at exit: 12 bytes in 1 blocks ==6434== total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated ==6434== ==6434== LEAK SUMMARY: *==6434== definitely lost: 12 bytes in 1 blocks ==6434== indirectly lost: 0 bytes in 0 blocks ==6434== possibly lost: 0 bytes in 0 blocks ==6434== still reachable: 0 bytes in 0 blocks ==6434== suppressed: 0 bytes in 0 blocks ==6434== Rerun with --leak-check=full to see details of leaked memory ==6434== ==6434== For counts of detected and suppressed errors, rerun with: -v ==6434== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ``` --- # Valgrind - Memcheck example ## .hcenter[test.c] ```C #include "stdlib.h" #include "stdio.h" #define SIZE 3 int main(int argc, char **argv) { int* table = (int*)malloc(sizeof(int) * SIZE); for (int i = 0; i < SIZE; i++) { table[i] = i; } for (int i = 0; i <= SIZE; i++) { printf("table[%d]=%d\n", i, table[i]); } return 0; } ``` ## .hcenter[Shell session] ```shell *$ gcc -g test.c -o test $ ./test table[0]=0 table[1]=1 table[2]=2 table[3]=0 ``` --- # Valgrind - Memcheck example ## .hcenter[Shell session] ```shell $ valgrind ./test =6522== Memcheck, a memory error detector ==6522== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6522== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==6522== Command: ./test ==6522== table[0]=0 table[1]=1 table[2]=2 ==6522== Invalid read of size 4 *==6522== at 0x108756: main (test.c:17) ==6522== Address 0x520104c is 0 bytes after a block of size 12 alloc'd ==6522== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6522== by 0x108708: main (test.c:8) ==6522== table[3]=0 ==6522== ==6522== HEAP SUMMARY: ==6522== in use at exit: 12 bytes in 1 blocks ==6522== total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated ==6522== ==6522== LEAK SUMMARY: ==6522== definitely lost: 12 bytes in 1 blocks ==6522== indirectly lost: 0 bytes in 0 blocks ==6522== possibly lost: 0 bytes in 0 blocks ==6522== still reachable: 0 bytes in 0 blocks ==6522== suppressed: 0 bytes in 0 blocks ==6522== Rerun with --leak-check=full to see details of leaked memory ==6522== ==6522== For counts of detected and suppressed errors, rerun with: -v ==6522== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ``` --- # Valgrind - Memcheck example ## .hcenter[test.c] ```C #include "stdlib.h" #include "stdio.h" #define SIZE 3 int main(int argc, char **argv) { int* table = (int*)malloc(sizeof(int) * SIZE); for (int i = 0; i < SIZE; i++) { table[i] = i; } for (int i = 0; i <= SIZE; i++) { * printf("table[%d]=%d\n", i, table[i]); } return 0; } ``` ## .hcenter[Shell session] ```shell $ gcc -g test.c -o test $ ./test table[0]=0 table[1]=1 table[2]=2 table[3]=0 ``` --- # Valgrind - Memcheck example ## .hcenter[Shell session] ```shell $ valgrind ./test =6522== Memcheck, a memory error detector ==6522== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6522== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==6522== Command: ./test ==6522== table[0]=0 table[1]=1 table[2]=2 ==6522== Invalid read of size 4 ==6522== at 0x108756: main (test.c:17) ==6522== Address 0x520104c is 0 bytes after a block of size 12 alloc'd ==6522== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6522== by 0x108708: main (test.c:8) ==6522== table[3]=0 ==6522== ==6522== HEAP SUMMARY: ==6522== in use at exit: 12 bytes in 1 blocks ==6522== total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated ==6522== ==6522== LEAK SUMMARY: ==6522== definitely lost: 12 bytes in 1 blocks ==6522== indirectly lost: 0 bytes in 0 blocks ==6522== possibly lost: 0 bytes in 0 blocks ==6522== still reachable: 0 bytes in 0 blocks ==6522== suppressed: 0 bytes in 0 blocks *==6522== Rerun with --leak-check=full to see details of leaked memory ==6522== ==6522== For counts of detected and suppressed errors, rerun with: -v ==6522== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ``` --- # Valgrind - Memcheck example ## .hcenter[Shell session] ```shell *$ valgrind --leak-check=full ./test ==6554== Memcheck, a memory error detector ==6554== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6554== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==6554== Command: ./test ==6554== table[0]=0 table[1]=1 table[2]=2 ==6554== Invalid read of size 4 ==6554== at 0x108756: main (test.c:17) ==6554== Address 0x520104c is 0 bytes after a block of size 12 alloc'd ==6554== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6554== by 0x108708: main (test.c:8) ==6554== table[3]=0 ==6554== ==6554== HEAP SUMMARY: ==6554== in use at exit: 12 bytes in 1 blocks ==6554== total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated ==6554== *==6554== 12 bytes in 1 blocks are definitely lost in loss record 1 of 1 *==6554== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) *==6554== by 0x108708: main (test.c:8) ==6554== ==6554== LEAK SUMMARY: ==6554== definitely lost: 12 bytes in 1 blocks ==6554== indirectly lost: 0 bytes in 0 blocks ==6554== possibly lost: 0 bytes in 0 blocks ==6554== still reachable: 0 bytes in 0 blocks ==6554== suppressed: 0 bytes in 0 blocks ==6554== ==6554== For counts of detected and suppressed errors, rerun with: -v ==6554== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0) ``` --- # Valgrind - Memcheck example ## .hcenter[test.c] ```C #include "stdlib.h" #include "stdio.h" #define SIZE 3 int main(int argc, char **argv) { * int* table = (int*)malloc(sizeof(int) * SIZE); for (int i = 0; i < SIZE; i++) { table[i] = i; } for (int i = 0; i <= SIZE; i++) { printf("table[%d]=%d\n", i, table[i]); } return 0; } ``` ## .hcenter[Shell session] ```shell $ gcc -g test.c -o test $ ./test table[0]=0 table[1]=1 table[2]=2 table[3]=0 ``` --- # Valgrind - Memcheck example ## .hcenter[test.c] ```C #include "stdlib.h" #include "stdio.h" #define SIZE 3 int main(int argc, char **argv) { int* table = (int*)malloc(sizeof(int) * SIZE); for (int i = 0; i < SIZE; i++) { table[i] = i; } * for (int i = 0; i < SIZE; i++) { printf("table[%d]=%d\n", i, table[i]); } * free(table); return 0; } ``` ## .hcenter[Shell session] ```shell $ gcc -g test.c -o test $ ./test table[0]=0 table[1]=1 table[2]=2 ``` --- # Valgrind - Memcheck example ## .hcenter[Shell session] ```shell $ valgrind ./test ==6614== Memcheck, a memory error detector ==6614== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6614== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==6614== Command: ./test ==6614== table[0]=0 table[1]=1 table[2]=2 ==6614== ==6614== HEAP SUMMARY: *==6614== in use at exit: 0 bytes in 0 blocks ==6614== total heap usage: 2 allocs, 2 frees, 1,036 bytes allocated ==6614== *==6614== All heap blocks were freed -- no leaks are possible ==6614== ==6614== For counts of detected and suppressed errors, rerun with: -v *==6614== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ``` ## Other detectable bugs * use of uninitialized values (with `--track-origins=yes`) * declaration of unused variables * invalid write * use of free'd resource --- # Valgrind - Conclusions :+: can easily locate some common (but specific) bugs :+: no need to recompile :+: no need to have source access :+: can be used everywhere (CLI) :warning: can not detect everything (static array overflow...) :warning: huge impact on performances :warning: slightly verbose ? :arrow_right: **My advice:** Strange bug ? Segfault ? Unexpected behavior ? **Valgrind !!!** No excuse. .hcenter.w90[] .hcenter[[source: xkcd](https://xkcd.com/722)] :arrow_right: Nothing visible with **Valgrind** ? Time to move to... **GDB** ! --- # GNU Project Debugger .hcenter.shadow.w20[] ## Key points * written by **Richard Stallman** in 1986 * official debugger of the GNU system * **best symbolic debugger ever** * learn `gdb`, use every other debugger .block[ * Repository: [https://sourceware.org/git/?p=binutils-gdb.git](https://sourceware.org/git/?p=binutils-gdb.git) * Website: [https://www.gnu.org/software/gdb](https://www.gnu.org/software/gdb) * License: `GPLv3` ] --- # GNU Project Debugger - Introduction ## What is a **good** symbolic debugger ? * a program to run and check the execution of another program * you can stop the program execution and explore its full state (memory, registers, call stack, threads, etc...) * you can **MODIFY** the program state at any time * you can **call functions** from your program scope at any time * you can **go back in time** in the execution flow * you can **see what is actually executed** ## Some problems that can be solved with a debugger * infinite loop ? see why the loop is not finishing * crash ? see what happened * a line is not executing ? see the program flow * a variable has not the expected value ? see how it is calculated * etc... --- class: top # GNU Project Debugger - Introduction .hcenter.shadow.w40[  ] ## Why not use `printf` ? -- * to avoid code pollution -- * a debugger can do **much more** than `printf` -- * you can attach `gdb` to a **running** program ! -- * you can debug post-mortem without restarting -- * you will learn from the lower layers (memory, stack, registers, etc...) -- * with `gdb`, no need to recompile and re-run --- # GNU Project Debugger - Compilation :arrow_right: Compile to binary objects with the `-g` flag. * this will add **debugging symbols** to your binary objects ## .hcenter[test.c] ```C #include "stdlib.h" #include "stdio.h" #define SIZE 3 int main(int argc, char **argv) { int* table = (int*)malloc(sizeof(int) * SIZE); for (int i = 0; i < SIZE; i++) { table[i] = i; } for (int i = 0; i < SIZE; i++) { printf("table[%d]=%d\n", i, table[i]); } free(table); return 0; } ``` --- # GNU Project Debugger - Debugging symbols .row[ .column.w48[ ## .hcenter[Shell session] ```shell $ gcc -c test.c $ nm -a test.o 0000000000000000 b .bss 0000000000000000 n .comment 0000000000000000 d .data 0000000000000000 r .eh_frame U free U _GLOBAL_OFFSET_TABLE_ 0000000000000000 T main U malloc 0000000000000000 n .note.GNU-stack U printf 0000000000000000 r .rodata 0000000000000000 a test.c 0000000000000000 t .text ``` ] .column.w48[ ## .hcenter[Shell session] ```shell $ gcc -c -g test.c $ nm -a test.o 0000000000000000 b .bss 0000000000000000 n .comment 0000000000000000 d .data *0000000000000000 N .debug_abbrev *0000000000000000 N .debug_aranges *0000000000000000 N .debug_info *0000000000000000 N .debug_line *0000000000000000 N .debug_str 0000000000000000 r .eh_frame U free U _GLOBAL_OFFSET_TABLE_ 0000000000000000 T main U malloc 0000000000000000 n .note.GNU-stack U printf 0000000000000000 r .rodata 0000000000000000 a test.c 0000000000000000 t .text ``` ] ] --- # GNU Project Debugger - Loading a binary ## .hcenter[Shell session] ```shell $ gcc -c -g test.c $ gcc test.o -o test *$ gdb test GNU gdb (Ubuntu 7.12.50.20170314-0ubuntu1.1) 7.12.50.20170314-git Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see:
. Find the GDB manual and other documentation resources online at:
. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from test...done. (gdb) ``` ## .hcenter[Shell session] ```shell $ gcc -c -g test.c $ gcc test.o -o test *$ gdb -q test Reading symbols from test...done. (gdb) ``` --- # GNU Project Debugger - Loading a binary ## .hcenter[Shell session] ```shell $ gcc -c -g test.c $ gcc test.o -o test $ gdb -q *(gdb) file test Reading symbols from test...done. (gdb) ``` ## .hcenter[Shell session] ```shell *$ gcc -c test.c $ gcc test.o -o test $ gdb -q (gdb) file test *Reading symbols from test...(no debugging symbols found)...done. (gdb) ``` :arrow_right: `gdb` can work without debugging symbols... but can You ? --- # GNU Project Debugger - Launching a program ## Run with arguments ```shell (gdb) file test Reading symbols from test...done. *(gdb) run -h --help --please --pleasehelpme Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test -h --help --please --pleasehelpme table[0]=0 table[1]=1 table[2]=2 [Inferior 1 (process 9109) exited normally] (gdb) ``` ## Run without arguments and break at `main()` ```shell (gdb) file test Reading symbols from test...done. *(gdb) start Temporary breakpoint 1 at 0x72f: file test.c, line 8. Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe548) at test.c:8 8 int* table = (int*)malloc(sizeof(int) * SIZE); (gdb) ``` ## Run with arguments and break at `main()` ```shell (gdb) file test Reading symbols from test...done. *(gdb) set args -h --help --please --pleasehelpme *(gdb) start Temporary breakpoint 1 at 0x72f: file test.c, line 8. Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test -h --help --please --pleasehelpme Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe548) at test.c:8 8 int* table = (int*)malloc(sizeof(int) * SIZE); (gdb) ``` --- # GNU Project Debugger - Running environment ## Show arguments ```shell (gdb) file test Reading symbols from test...done. (gdb) run 1 -2 --3 4 Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test 1 -2 --3 4 table[0]=0 table[1]=1 table[2]=2 [Inferior 1 (process 9109) exited normally] *(gdb) show args Argument list to give program being debugged when it is started is "1 -2 --3 4". (gdb) ``` ## Show and set environment variables ```shell (gdb) file test Reading symbols from test...done. *(gdb) show environment [...] SHELL_SESSION_ID=40fbec4f88ec4e3da8d0ae5adcdb501f COLORTERM=truecolor USERNAME=dubrayn [...] *(gdb) show environment SHELL SHELL = /bin/bash *(gdb) set environment SHELL /bin/false *(gdb) show environment SHELL SHELL = /bin/false ``` --- # GNU Project Debugger - Source ## List the current source line and its surroundings ```shell (gdb) file test Reading symbols from test...done. (gdb) start Temporary breakpoint 1 at 0x72f: file test.c, line 8. Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe548) at test.c:8 8 int* table = (int*)malloc(sizeof(int) * SIZE); *(gdb) list 3 4 #define SIZE 3 5 6 int main(int argc, char **argv) 7 { 8 int* table = (int*)malloc(sizeof(int) * SIZE); 9 10 for (int i = 0; i < SIZE; i++) 11 { 12 table[i] = i; ``` --- # GNU Project Debugger - Machine code ## Disassemble the current function and show the current instruction ```shell (gdb) file test Reading symbols from test...done. (gdb) start Temporary breakpoint 4 at 0x55555555472f: file test.c, line 8. Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test Temporary breakpoint 4, main (argc=1, argv=0x7fffffffe548) at test.c:8 8 int* table = (int*)malloc(sizeof(int) * SIZE); *(gdb) disassemble Dump of assembler code for function main: 0x0000555555554720 <+0>: push %rbp 0x0000555555554721 <+1>: mov %rsp,%rbp 0x0000555555554724 <+4>: sub $0x20,%rsp 0x0000555555554728 <+8>: mov %edi,-0x14(%rbp) 0x000055555555472b <+11>: mov %rsi,-0x20(%rbp) => 0x000055555555472f <+15>: mov $0xc,%edi 0x0000555555554734 <+20>: callq 0x5555555545e0 0x0000555555554739 <+25>: mov %rax,-0x8(%rbp) 0x000055555555473d <+29>: movl $0x0,-0x10(%rbp) 0x0000555555554744 <+36>: jmp 0x555555554763
0x0000555555554746 <+38>: mov -0x10(%rbp),%eax 0x0000555555554749 <+41>: cltq 0x000055555555474b <+43>: lea 0x0(,%rax,4),%rdx 0x0000555555554753 <+51>: mov -0x8(%rbp),%rax 0x0000555555554757 <+55>: add %rax,%rdx [...] 0x00005555555547af <+143>: callq 0x5555555545d0 0x00005555555547b4 <+148>: mov $0x0,%eax 0x00005555555547b9 <+153>: leaveq 0x00005555555547ba <+154>: retq End of assembler dump. (gdb) ``` --- # GNU Project Debugger - Main commands ## Execution flow control * `start`: start the program execution and break at `main()` * `run args`: restart the program execution with arguments `args` * `kill`: stop the program execution * `step`: execute until next line of code, go deeper if needed (and possible) * `stepi`: execute next machine instruction, go deeper if needed (and possible) * `next`: execute until next line of code, do not go deeper * `nexti`: execute next machine instruction, do not go deeper * `finish`: execute until exiting the current function * `continue`: continue the program execution ## Breakpoints * `break loc`: set a **breakpoint** at location `loc` * `clear loc`: remove a breakpoint at location `loc` * `info breakpoints`: print the list of breakpoints and watchpoints * `delete id`: delete the breakpoint or watchpoint `id` --- # GNU Project Debugger - Core dump :arrow_right: An "image" of the state of a program at a given moment in time, that allows to perform (among other things) a **post-mortem analysis** of a crash. ## Activate core dump at segfault ```shell $ ulimit -c 0 *$ ulimit -c unlimited $ ulimit -c unlimited ``` ## Dump core at segfault and analyse with `gdb` .row[ .column.w30[ ## .hcenter[`segfault.c`] ```C #include "stdlib.h" void main(void) { int *p = NULL; *p = 0; } ``` ] .column.w65[ ## .hcenter[Shell session] ```shell $ gcc -c -g segfault.c $ gcc segfault.o -o segfault $ ./segfault *Segmentation fault (core dumped) *$ gdb -q segfault core.8840 Reading symbols from segfault...done. [New LWP 9812] Core was generated by `./segfault'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00000000004005c8 in main () at segfault.c:6 6 *p = 0; (gdb) ``` ] ] --- # GNU Project Debugger - Call stack / variables
--- # GNU Project Debugger - Call stack / variables :arrow_right: The **call stack** is a stack of **frames**. When following a `call` command, a new frame is added on top of the existing ones. Each **frame** contains local variables, function arguments and the local address pointer. ## Manipulate frames * `backtrace`: print the call stack * `backtrace full`: print the full call stack (print variables, arguments...) * `frame`: select a frame * `up`: got one frame up in the call stack * `down`: got one frame down in the call stack ## Print variables / expressions * `print expr`: evaluate and print `expr` * `print/x expr`: evaluate and print `expr` in hexadecimal format * `print * addr`: print the value located at `addr` address * `display expr`: evaluate and print `expr` at each break * `undisplay nb`: cancel display number `nb` --- # GNU Project Debugger - Watchpoints :arrow_right: Stop the program when a variable, expression or part of memory changes. ## Commands to manipulate watchpoints * `watch expr`: break if `expr` change, **even if it is not defined yet** * `info watchpoints`: print the list of watchpoints * `delete id`: delete the breakpoint or watchpoint `id` --- # GNU Project Debugger - Program scope modification ## .hcenter[:warning: These commands will alter the "normal" execution of the program :warning:] ## Change program variables * `set variable v = expr`: evaluate `expr` and assign its value to the variable `v` ## Call a function * `call func(...)`: call function `func` in the program scope ## Return from a function * `return`: return to the calling function **now** --- # GNU Project Debugger - Conditional breakpoint :arrow_right: Allows to break only if a given condition is verified. ## Add a condition to a breakpoint * `condition id cond`: add `cond` condition to the breakpoint `id` ## .hcenter[Example `gdb` session] ```text (gdb) file test Reading symbols from test...done. (gdb) start Temporary breakpoint 1 at 0x4006a7: file test.c, line 8. Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test Temporary breakpoint 1, main (argc=1, argv=0x7fffffffd7c8) at test.c:8 8 int* table = (int*)malloc(sizeof(int) * SIZE); (gdb) break 12 Breakpoint 3 at 0x4006be: file test.c, line 12. *(gdb) condition 3 i > 1 (gdb) continue Continuing. Breakpoint 3, main (argc=1, argv=0x7fffffffd7c8) at test.c:12 12 table[i] = i; (gdb) print i *$1 = 2 ``` --- # GNU Project Debugger - Thread debugging :arrow_right: Threads can be easily debugged with `gdb`. ## Threads-related commands * `thread id`: switch to thread `id` * `info threads`: show existing threads * `thread apply [thread-id-list] [all] cmd`: apply `cmd` to a list of threads ## Armadillo threads ? * the Armadillo library uses the `BLAS` library, which can spawn threads :arrow_right: Use the environment variable `OPENBLAS_NUM_THREADS` to control the number of threads used by `OpenBLAS`. ## .hcenter[Shell session] ```shell $ export OPENBLAS_NUM_THREADS=1 # no thread spawning $ export OPENBLAS_NUM_THREADS=$(nproc) # full throttle !!! ``` --- # GNU Project Debugger - Conque-GDB vim addon
--- # GNU Project Debugger - Qt-Creator .hcenter.shadow.w100[] --- # GNU Project Debugger - Reverse debugging .hcenter.shadow.w40[] :arrow_right: Record the full program state and **go back and forth in time** ! ## Commands **to go back in time** * `record`: start the recording of the program state * `reverse-continue`: continue **backwards** * `reverse-finish`: continue backwards until the lower frame * `reverse-step`: go back to the previous line, go deeper if needed * `reverse-stepi`: go back to the previous machine instruction, go deeper if needed * `reverse-next`: go back to the previous line, do not go deeper * `reverse-nexti`: go back to the previous machine instruction, do not go deeper * `set exec-direction (forward/reverse)`: set direction of execution for `next`, `step`, etc... --- # GNU Project Debugger - Reverse debugging
--- # GNU Project Debugger - Conclusions ## About `gdb` :+: **best debugger ever** :warning: CLI interface **might be** hard to grasp :warning: several commands to remember :+: the debugging workflow is **almost the same** in every other debugger :+: if `gdb` is often used "below" a nice GUI, why use a GUI ? ## About debugging :+: no more `printf` !!! :+: gain some knowledge about the lower layers (stack, cache, register, etc...) :+: do not trust your compiler, check the generated code (and learn ASM) :+: "A segfault ? Yes ! Time to play with `gdb` !" .hcenter.w90[] .hcenter[[source: xkcd](https://xkcd.com/371)] --- # Scientific programming ## Scientific programming tools presented in this course * linear algebra libraries (usable from `C++`) * `Boost`, `LAPACK`, `BLAS`, `MKL`, `GSL`, `Eigen`, **`Armadillo`** :+: * scientific programming with Python * **introduction to Python** :+: * **`NumPy`** .animated.infinite.flash[:arrow_left:] * presenting results * write beautiful math: ** .latex[L
a
T
e
X] ** :+: * structure the documentation: **Markdown** :+: * write some slides: **remark.js** :+: ## Scientific programming techniques presented in this course * numerical integration * **Gaussian quadrature** :+: * optimization of sum calculation :+: --- # NumPy .hcenter.w40[] ## Key points * core library for scientific computing in Python * very close from MATLAB (but open-source :v:) * provides very efficient multi-dimensional arrays * provides several tools to integrate C/C++ and Fortran codes * provides functions for: linear algebra, RNG, Fourier transforms, etc... .block[ * Repository: [https://github.com/numpy/numpy](https://github.com/numpy/numpy) * Website: [http://www.numpy.org](http://www.numpy.org) * License: `BSD` ] --- # NumPy - Arrays :arrow_right: The main object of the `numpy` package is the **homogeneous multidimensional array**. ## Array initialization ```Python import numpy as np a = np.array([2, 3, 4]) v = np.array([2.1, 3.4, 2.7]) ``` ## Type of the stored data :arrow_right: The type is infered from the values. ```Python import numpy as np np.array([2, 3, 4 ]).dtype # dtype('int64') np.array([2, 3, 4.0]).dtype # dtype('float64') ``` ## Size of each element :arrow_right: Useful to calculate the memory footprint of an array. ```Python import numpy as np np.array([2, 3, 4 ]).itemsize # 8 np.array([2, 3, 4.0]).itemsize # 8 ``` --- # NumPy - Arrays ## Initialization from a list [of tuple [of list [...]]] ```Python import numpy as np dvec = np.array([1.0, 2.0, 3.0]) # 3 double vector dmat = np.array([[1, 2, 3], (2, 3, 4.0)]) # 2x3 double matrix icube = np.array([[[1, 2, 3], [2, 3, 4]], [[3, 4, 5], [4, 5, 6]]]) # 2x2x3 integer cube ``` ## Explicit type specification ```Python import numpy as np cmat = np.array([[1, 2], [3, 4]], dtype = complex) # 2x2 complex double matrix ``` ## Initialization by conversion ```Python import numpy as np dvec = np.array([1.0, 2.0, 3.0]) # 3 double vector cvec = np.array(dvec, dtype = complex) # 3 complex double vector ``` ## Initialization by copy ```Python import numpy as np ivec = np.array([1, 2, 3]) a = ivec a is ivec # True <- WARNING !!! b = np.array(ivec) b is ivec # False c = ivec[:] c is ivec # False d = np.copy(ivec) d is ivec # False ``` --- # NumPy - Arrays ## Rank (number of axes) ```Python import numpy as np a = np.array([1, 2, 3]) a.ndim # 1 b = np.array([[1, 2], [3, 4]]) b.ndim # 2 ``` ## Dimensions ```Python import numpy as np a = np.array([1, 2, 3]) a.shape # (3,) b = np.array([[1, 2], [3, 4]]) b.shape # (2, 2) ``` ## Number of elements ```Python import numpy as np a = np.array([1, 2, 3]) a.size # 3 b = np.array([[1, 2], [3, 4]]) b.size # 4 ``` --- # NumPy - Array creation .row[ .column.w48[ ## Ones and zeros ```Python import numpy as np print(np.eye(3, 4)) # [[ 1. 0. 0. 0.] # [ 0. 1. 0. 0.] # [ 0. 0. 1. 0.]] print(np.identity(3)) # [[ 1. 0. 0.] # [ 0. 1. 0.] # [ 0. 0. 1.]] print(np.zeros([3, 4])) # [[ 0. 0. 0. 0.] # [ 0. 0. 0. 0.] # [ 0. 0. 0. 0.]] print(np.ones([3, 4])) # [[ 1. 1. 1. 1.] # [ 1. 1. 1. 1.] # [ 1. 1. 1. 1.]] print(np.tri(3, dtype = int)) # [[1 0 0] # [1 1 0] # [1 1 1]] print(np.full([3, 4], 6)) # [[6 6 6 6] # [6 6 6 6] # [6 6 6 6]] ``` ] .column.w48[ ## Copy the shape of another array ```Python import numpy as np # uninitialized 3x4 int16 matrix: a = np.empty([3, 4], dtype = np.int16) # uninitialized 3x4 int16 matrix b = np.empty_like(a) print(np.zeros_like(a)) # [[0 0 0 0] # [0 0 0 0] # [0 0 0 0]] print(np.ones_like(a)) # [[1 1 1 1] # [1 1 1 1] # [1 1 1 1]] print(np.full_like(a, 7)) # [[7 7 7 7] # [7 7 7 7] # [7 7 7 7]] print(np.full_like(a, 7.5)) # [[7 7 7 7] # [7 7 7 7] # [7 7 7 7]] ``` ] ] --- # NumPy - Arrays .row[ .column.w48[ ## Matrix functions ```Python import numpy as np a = np.arange(12).reshape([3, 4]) print(a) # [[ 0 1 2 3] # [ 4 5 6 7] # [ 8 9 10 11]] print(np.diag(a)) # [ 0 5 10] print(a.diagonal()) # [ 0 5 10] print(a.diagonal(1)) # [ 1 6 11] print(a.diagonal(2)) # [2 7] print(np.tril(a)) # [[ 0 0 0 0] # [ 4 5 0 0] # [ 8 9 10 0]] print(np.triu(a)) # [[ 0 1 2 3] # [ 0 5 6 7] # [ 0 0 10 11]] ``` ] .column.w48[ ## Matrix multiplication ```Python import numpy as np a = np.arange(9).reshape([3, 3]) print(a) # [[0 1 2] # [3 4 5] # [6 7 8]] b = np.arange(-4, 5).reshape([3, 3]) print(b) # [[-4 -3 -2] # [-1 0 1] # [ 2 3 4]] print(a * b) # element-wise multiplication ! # [[ 0 -3 -4] # [-3 0 5] # [12 21 32]] print(a.dot(b)) # matrix multiplication # [[ 3 6 9] # [ -6 6 18] # [-15 6 27]] print(np.linalg.matrix_power(a, 4)) # [[ 2430 3132 3834] # [ 7452 9612 11772] # [12474 16092 19710]] ``` ]] --- # NumPy - Arrays ## Main linear algebra functions ```Python import numpy as np b = np.arange(-4, 5).reshape([3, 3]) print(b) # [[-4 -3 -2] # [-1 0 1] # [ 2 3 4]] print(b.transpose()) # [[-4 -1 2] # [-3 0 3] # [-2 1 4]] print(np.det(b)) # 0.0 print(np.linalg.norm(b)) # 7.745966692414834 eigval, eigvec = np.linalg.eig(b) print(eigval) # [ -4.24264069e+00 3.81639165e-17 4.24264069e+00] print(eigvec) # [[ 0.90104757 0.40824829 -0.3236973 ] # [ 0.28867513 -0.81649658 0.28867513] # [-0.3236973 0.40824829 0.90104757]] print(np.matmul(b, eigvec[:,[0]]) - eigval[0] * eigvec[:,[0]]) # [[ 4.44089210e-16] # [ 4.44089210e-16] # [ 6.66133815e-16]] ``` --- # NumPy - Arrays ## Change the rank and dimensions (not the data) ```Python import numpy as np a = np.arange(12) print(a) # [ 0 1 2 3 4 5 6 7 8 9 10 11] b = a.reshape([3, 4]) print(b) # [[ 0 1 2 3] # [ 4 5 6 7] # [ 8 9 10 11]] print(b.reshape([1, 12])) # [[ 0 1 2 3 4 5 6 7 8 9 10 11]] ``` .row[ .column.w48[ ## Access to rows ```Python import numpy as np a = np.arange(12).reshape([3, 4]) print(a) # [[ 0 1 2 3] # [ 4 5 6 7] # [ 8 9 10 11]] print(a[2, :]) # [ 8 9 10 11] print(a[[2], :]) # [[ 8 9 10 11]] a[2, :] = [9, 9, 9, 9] print(a) # [[0 1 2 3] # [4 5 6 7] # [9 9 9 9]] ``` ] .column.w48[ ## Access to columns ```Python import numpy as np a = np.arange(12).reshape([3, 4]) print(a) # [[ 0 1 2 3] # [ 4 5 6 7] # [ 8 9 10 11]] print(a[:, 3]) # [ 3 7 11] print(a[:, [3]]) # [[ 3] # [ 7] # [11]] a[:, 3] = [9, 9, 9] print(a) # [[ 0 1 2 9] # [ 4 5 6 9] # [ 8 9 10 9]] ``` ] ] --- # NumPy - Arrays ## Generate a sequence of numbers * to generate an array instead of a list * accepts **float arguments** ```Python import numpy as np np.arange(8) # array([ 0 , 1 , 2 , 3 , 4, 5 , 6 , 7 ]) np.arange(8.) # array([ 0. , 1. , 2. , 3. , 4. , 5., 6., 7.]) np.arange(8, 13) # array([ 8 , 9 , 10 , 11 , 12 ]) np.arange(8, 13.2) # array([ 8. , 9. , 10. , 11. , 12. ]) np.arange(8.2, 13) # array([ 8.2, 9.2, 10.2, 11.2, 12.2]) np.arange(8, 13, 2) # array([ 8 , 10 , 12 ]) np.arange(8, 13, 2.2) # array([ 8 , 10.2, 12.4]) np.arange(8.2, 13, 2.2) # array([ 8.2, 10.4, 12.6]) ``` ## Subdivise a range ```Python import numpy as np np.linspace(0, 1, 3) # array([ 0., 0.5, 1. ]) np.linspace(3, 7, 9) # array([ 3., 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. ]) ``` --- # NumPy - Some broadcast functions .row[ .column.w55.middle[ ## `Sin` and `Cos` functions ```Python import numpy as np import matplotlib.pyplot as plt t = np.linspace(0, 2 * np.pi, 200) a = np.sin(t) b = np.cos(t) plt.plot(t, a, 'r') plt.plot(t, b, 'g') plt.show() ``` ] .column.w40.middle[  ] ] ## List of trigonometric functions ```Python sin cos tan arcsin arccos arctan arctan2 hypot sinh cosh tanh arcsinh arccosh arctanh deg2rad rad2deg ``` --- # NumPy - Print arrays ## Long arrays :arrow_right: When an array is too large, only the corners are printed. ```Python import numpy as np np.get_printoptions()['threshold'] # 1000 print(np.arange(1002).reshape([3, 334])) # [[ 0 1 2 ..., 331 332 333] # [ 334 335 336 ..., 665 666 667] # [ 668 669 670 ..., 999 1000 1001]] np.set_printoptions(threshold = 100) print(np.arange(100).reshape([10, 10])) # [[ 0 1 2 3 4 5 6 7 8 9] # [10 11 12 13 14 15 16 17 18 19] # [20 21 22 23 24 25 26 27 28 29] # [30 31 32 33 34 35 36 37 38 39] # [40 41 42 43 44 45 46 47 48 49] # [50 51 52 53 54 55 56 57 58 59] # [60 61 62 63 64 65 66 67 68 69] # [70 71 72 73 74 75 76 77 78 79] # [80 81 82 83 84 85 86 87 88 89] # [90 91 92 93 94 95 96 97 98 99]] print(np.arange(110).reshape([11, 10])) # [[ 0 1 2 ..., 7 8 9] # [ 10 11 12 ..., 17 18 19] # [ 20 21 22 ..., 27 28 29] # ..., # [ 80 81 82 ..., 87 88 89] # [ 90 91 92 ..., 97 98 99] # [100 101 102 ..., 107 108 109]] ``` --- # NumPy - Random functions .row[ .column.w55.middle[ ## White noise ```Python import numpy as np import matplotlib.pyplot as plt *data = np.random.rand(10000) print(data.mean()) # mean value: 0.494221 print(data.var()) # variance : 0.083729 plt.hist(data, bins=100) plt.show() ``` ] .column.w40.middle[  ] ] .row[ .column.w55.middle[ ## Normal distribution ```Python import numpy as np import matplotlib.pyplot as plt *data = np.random.randn(10000) print(data.mean()) # mean value: -0.000091 print(data.var()) # variance : 1.002125 plt.hist(data, bins=100) plt.show() ``` ] .column.w40.middle[  ] ] --- # NumPy - Arrays ## Load and save arrays ```Python import numpy as np dmat0 = np.eye(5) np.save("filename", dmat0) dmat1 = np.load("filename.npy") np.linalg.norm(dmat0 - dmat1) # 0.0 ``` * save an array in the binary `.npy` format * avoid `np.fromfile()` and `x.tofile()` ## Construct an array from a function ```Python import numpy as np imat = np.fromfunction(lambda i, j: i + j, (3, 3), dtype = int) print(imat) # [[0 1 2] # [1 2 3] # [2 3 4]] ``` --- # NumPy - Arrays ## Iterate over all elements of an array ```Python import numpy as np a = np.arange(6).reshape(2,3) print(a) # [[0 1 2] # [3 4 5]] for x in np.nditer(a): # readonly access print x, # 0 1 2 3 4 5 for x in np.nditer(a, op_flags=['readwrite']): # readwrite access x[...] = 2 * x print(a) # [[ 0 2 4] # [ 6 8 10]] ``` ## Iterate over all elements of an array and get index values ```Python import numpy as np a = np.arange(6).reshape(2,3) print(a) # [[0 1 2] # [3 4 5]] it = np.nditer(a, flags=['multi_index']) while not it.finished: print "%d <%s>" % (it[0], it.multi_index), it.iternext() # 0 <(0, 0)> 1 <(0, 1)> 2 <(0, 2)> 3 <(1, 0)> 4 <(1, 1)> 5 <(1, 2)> ``` --- # NumPy - Arrays ## Mesh grid :arrow_right: Use this to fill arrays with index-dependent expressions. ```Python import numpy as np print(np.mgrid[0:3, 0:4][0]) # [[0 0 0 0] # [1 1 1 1] # [2 2 2 2]] print(np.mgrid[0:3, 0:4][1]) # [[0 1 2 3] # [0 1 2 3] # [0 1 2 3]] ``` ## Example `$$ \forall i \in [0..2], \forall j \in [0..3], A_{ij} \equiv i + 0.1 j $$` ```Python import numpy as np i = np.mgrid[0:3, 0:4][0] j = np.mgrid[0:3, 0:4][1] A = i + j * 0.1 print(A) # [[ 0. 0.1 0.2 0.3] # [ 1. 1.1 1.2 1.3] # [ 2. 2.1 2.2 2.3]] ``` --- # NumPy - Efficient code ## Example We want to calculate the **Frobenius norm** of a matrix product: `$$ N \equiv \left|A.B\right|_{\textrm{fro}} $$` with `$$ \forall i \in [0..\,n_0 - 1], \forall j \in [0..\,n_1 - 1], A_{ij} \equiv \frac{i + \frac{j}{10}}{10}, B_{ji} \equiv \frac{i + \frac{j}{10}}{10}. $$` --- class: top # NumPy - Efficient code ## Naive implementation ```Python import numpy as np import time n0 = 2000 n1 = 2000 print("===== Numpy 1a =====") time0 = time.time() A = np.empty((n0, n1)) B = np.empty((n1, n0)) for i in range(n0): for j in range(n1): A[i, j] = (i + j * 0.1) * 0.1 B[j, i] = (i + j * 0.1) * 0.1 time1 = time.time() N = np.linalg.norm(A.dot(B), ord = 'fro') time1 = time.time() print("fill. time: %fs" % (time1 - time0)) print("calc. time: %fs" % (time2 - time1)) print("total time: %fs" % (time2 - time0)) print("result : %20.14e" % (N)) ``` -- ## Result ```text ===== Numpy 1a ===== *fill. time: 2.277015s *calc. time: 0.426645s total time: 2.703660s result : 6.17895105956704e+10 ``` --- class: top # NumPy - Efficient code .row[ .column.w48[ ## Naive implementation ```Python import numpy as np import time n0 = 2000 n1 = 2000 print("===== Numpy 1a =====") time0 = time.time() A = np.empty((n0, n1)) *B = np.empty((n1, n0)) for i in range(n0): for j in range(n1): A[i, j] = (i + j * 0.1) * 0.1 * B[j, i] = (i + j * 0.1) * 0.1 time1 = time.time() N = np.linalg.norm(A.dot(B), ord = 'fro') time2 = time.time() print("fill. time: %fs" % (time1 - time0)) print("calc. time: %fs" % (time2 - time1)) print("total time: %fs" % (time2 - time0)) print("result : %20.14e" % (N)) ``` ] .column.w48[ ## Using transposition ```Python import numpy as np import time n0 = 2000 n1 = 2000 print("===== Numpy 1b =====") time0 = time.time() A = np.empty((n0, n1)) for i in range(n0): for j in range(n1): A[i, j] = (i + j * 0.1) * 0.1 *B = A.transpose() time1 = time.time() N = np.linalg.norm(A.dot(B), ord = 'fro') time2 = time.time() print("fill. time: %fs" % (time1 - time0)) print("calc. time: %fs" % (time2 - time1)) print("total time: %fs" % (time2 - time0)) print("result : %20.14e" % (N)) ``` ]] -- .row[ .column.w48[ ## Result ```text ===== Numpy 1a ===== *fill. time: 2.277015s *calc. time: 0.426645s total time: 2.703660s result : 6.17895105956704e+10 ``` ] .column.w48[ ## Result ```text ===== Numpy 1b ===== *fill. time: 1.207309s *calc. time: 0.199436s total time: 1.406745s result : 6.17895105956704e+10 ``` ]] --- class: top # NumPy - Efficient code .row[ .column.w48[ ## Using transposition ```Python import numpy as np import time n0 = 2000 n1 = 2000 print("===== Numpy 1b =====") time0 = time.time() A = np.empty((n0, n1)) *for i in range(n0): * for j in range(n1): * A[i, j] = (i + j * 0.1) * 0.1 B = A.transpose() time1 = time.time() N = np.linalg.norm(A.dot(B), ord = 'fro') time2 = time.time() print("fill. time: %fs" % (time1 - time0)) print("calc. time: %fs" % (time2 - time1)) print("total time: %fs" % (time2 - time0)) print("result : %20.14e" % (N)) ``` ] .column.w48[ ## Using iterator ```Python import numpy as np import time n0 = 2000 n1 = 2000 print("===== Numpy 2 =====") time0 = time.time() A = np.empty((n0, n1)) *it = np.nditer(A, flags = ['multi_index'], * op_flags = ['writeonly']) *while not it.finished: * it[0] = (it.multi_index[0] * + it.multi_index[1] * 0.1) * 0.1 * it.iternext() B = A.transpose() time1 = time.time() N = np.linalg.norm(A.dot(B), ord = 'fro') time2 = time.time() print("fill. time: %fs" % (time1 - time0)) print("calc. time: %fs" % (time2 - time1)) print("total time: %fs" % (time2 - time0)) print("result : %20.14e" % (N)) ``` ]] -- .row[ .column.w48[ ## Result ```text ===== Numpy 1b ===== *fill. time: 1.207309s calc. time: 0.199436s total time: 1.406745s result : 6.17895105956704e+10 ``` ] .column.w48[ ## Result ```text ===== Numpy 2 ===== *fill. time: 2.498673s calc. time: 0.199323s total time: 2.697996s result : 6.17895105956704e+10 ``` ]] --- class: top # NumPy - Efficient code .row[ .column.w48[ ## Using transposition ```Python import numpy as np import time n0 = 2000 n1 = 2000 print("===== Numpy 1b =====") time0 = time.time() *A = np.empty((n0, n1)) *for i in range(n0): * for j in range(n1): * A[i, j] = (i + j * 0.1) * 0.1 B = A.transpose() time1 = time.time() N = np.linalg.norm(A.dot(B), ord = 'fro') time2 = time.time() print("fill. time: %fs" % (time1 - time0)) print("calc. time: %fs" % (time2 - time1)) print("total time: %fs" % (time2 - time0)) print("result : %20.14e" % (N)) ``` ] .column.w48[ ## Using `np.mgrid` ```Python import numpy as np import time n0 = 2000 n1 = 2000 print("===== Numpy 3 =====") time0 = time.time() *im, jm = np.mgrid[0:n0, 0:n1] *A = (im + jm * 0.1) * 0.1 B = A.transpose() time1 = time.time() N = np.linalg.norm(A.dot(B), ord = 'fro') time2 = time.time() print("fill. time: %fs" % (time1 - time0)) print("calc. time: %fs" % (time2 - time1)) print("total time: %fs" % (time2 - time0)) print("result : %20.14e" % (N)) ``` ] ] -- .row[ .column.w48[ ## Result ```text ===== Numpy 1b ===== *fill. time: 1.207309s calc. time: 0.199436s total time: 1.406745s result : 6.17895105956704e+10 ``` ] .column.w48[ ## Result ```text ===== Numpy 3 ===== *fill. time: 0.078148s calc. time: 0.198162s total time: 0.276310s result : 6.17895105956704e+10 ``` ] ] --- # NumPy - Python accelerators :arrow_right: Some tools allow to "compile" Python / NumPy code : * **Numba**: * JIT compilation of Python code * website: [https://numba.pydata.org](https://numba.pydata.org) * **Cython**: * write and compile modules using a superset of the Python language and use them from standard Python * website: [http://cython.org](http://cython.org) * **PyPy**: * highly compatible alternative implementation of the Python language using JIT compilation * :warning: does not support NumPy :warning: * website: [https://pypy.org](https://pypy.org) * **Pythran**: * generate and compile `C++` code from annotated Python code (supports only a subset of the Python language) * website: [http://pythran.readthedocs.io/en/latest](http://pythran.readthedocs.io/en/latest) --- # NumPy - Numba ## Usage * add `@jit` decorator before the functions to be JIT compiled .row[ .column.w48[ ## Example ```Python import numpy as np import time from numba import jit print("===== Numba =====") time0 = time.time() n0 = 2000 n1 = 2000 @jit def fill(n0, n1): a = np.empty((n0, n1)) for i in range(n0): for j in range(n1): a[i, j] = (i + j * 0.1) * 0.1 b = a.transpose() return a, b @jit def calc(a, b): return np.linalg.norm(a.dot(b), ord = 'fro') time1 = time.time() A, B = fill(n0, n1) time2 = time.time() N = calc(A, B) time3 = time.time() print("comp. time: %fs" % (time1 - time0)) print("fill. time: %fs" % (time2 - time1)) print("calc. time: %fs" % (time3 - time2)) print("TOTAL time: %fs" % (time3 - time0)) print("result : %20.14e" % (N)) ``` ] .column.w48.middle[ ## Result ```text ===== Numba ===== comp. time: 0.023121s fill. time: 0.339344s calc. time: 0.277222s TOTAL time: 0.639687s result : 6.17895105956704e+10 ``` ] ] --- # NumPy - Cython ## Module compilation * write a `calc.pyx` file * write a `setup_cython.py` file * compile the module with ```shell $ python setup_cython.py build_ext --inplace ``` .row[ .column.w55[ ## .hcenter[`[calc.pyx]`] ```Python import numpy as np cimport numpy as np DTYPE = np.float64 ctypedef np.float64_t DTYPE_t cimport cython # turn off bounds-checking for entire function @cython.boundscheck(False) # turn off negative index wrapping for entire function @cython.wraparound(False) def fill(int n0, int n1): cdef int i cdef int j cdef np.ndarray A = np.empty((n0, n1), dtype = DTYPE) cdef DTYPE_t value for i in range(n0): for j in range(n1): value = (i + j * 0.1) * 0.1 A[i, j] = value B = A.transpose() return A, B def calc(np.ndarray A, np.ndarray B): return np.linalg.norm(A.dot(B), ord = 'fro') ``` ] .column.w40.middle[ ## .hcenter[`[setup_cython.py]`] ```Python from distutils.core import setup from Cython.Build import cythonize setup( ext_modules=cythonize("calc.pyx"), ) ``` ] ] --- class: top # NumPy - Cython .row[ .column.w48[ ## Using `np.mgrid` ```Python import numpy as np import time n0 = 2000 n1 = 2000 print("===== Numpy 3 =====") time0 = time.time() *im, jm = np.mgrid[0:n0, 0:n1] *A = (im + jm * 0.1) * 0.1 *B = A.transpose() time1 = time.time() *N = np.linalg.norm(A.dot(B), ord = 'fro') time2 = time.time() print("fill. time: %fs" % (time1 - time0)) print("calc. time: %fs" % (time2 - time1)) print("total time: %fs" % (time2 - time0)) print("result : %20.14e" % (N)) ``` ] .column.w48[ ## Using Cython ```Python import numpy as np import time n0 = 2000 n1 = 2000 print("===== Cython =====") time0 = time.time() *import calc time1 = time.time() *A, B = calc.fill(n0, n1) time2 = time.time() *N = calc.calc(A, B) time3 = time.time() print("init. time: %fs" % (time1 - time0)) print("fill. time: %fs" % (time2 - time1)) print("calc. time: %fs" % (time3 - time2)) print("TOTAL time: %fs" % (time3 - time0)) print("result : %20.14e" % (N)) ``` ] ] -- .row[ .column.w48[ ## Result ```text ===== Numpy 3 ===== *fill. time: 0.078148s *calc. time: 0.198162s total time: 0.276310s result : 6.17895105956704e+10 ``` ] .column.w48[ ## Result ```text ===== Cython ===== init. time: 0.000356s *fill. time: 0.789828s *calc. time: 0.429509s TOTAL time: 1.219693s result : 6.17895105956704e+10 ``` ] ] --- # NumPy - Pythran ## Module compilation * write a `matrix_multiply.py` file * use Pythran special decorators and directives * compile the module with ```shell $ pythran matrix_multiply.py ``` ## .hcenter[`[matrix_multiply.py]`] ```Python #pythran export matrix_multiply(float list list, float list list) def zero(n, m): return [[0 for row in xrange(n)] for col in xrange(m)] def matrix_multiply(m0, m1): new_matrix = zero(len(m0), len(m1[0])) for i in xrange(len(m0)): for j in xrange(len(m1[0])): r = 0 "omp parallel for reduction(+:r)" for k in xrange(len(m0)): r += m0[i][k] * m1[k][j] new_matrix[i][j] = r return new_matrix ``` --- class: top # NumPy - Pythran .row[ .column.w48[ ## Using `np.mgrid` ```Python import numpy as np import time n0 = 2000 n1 = 2000 print("===== Numpy 3 =====") time0 = time.time() *im, jm = np.mgrid[0:n0, 0:n1] *A = (im + jm * 0.1) * 0.1 *B = A.transpose() time1 = time.time() *N = np.linalg.norm(A.dot(B), ord = 'fro') time2 = time.time() print("fill. time: %fs" % (time1 - time0)) print("calc. time: %fs" % (time2 - time1)) print("total time: %fs" % (time2 - time0)) print("result : %20.14e" % (N)) ``` ] .column.w48[ ## Using Pythran ```Python import numpy as np import time n0 = 2000 n1 = 2000 print("===== Pythran =====") time0 = time.time() *import matrix_multiply time1 = time.time() *A = [[((i + j * 0.1) * 0.1) * for i in xrange(n0)] for j in xrange(n1)] *B = A.transpose() time2 = time.time() *mat = matrix_multiply.matrix_multiply(A, B) time3 = time.time() *N = np.linalg.norm(mat, ord = 'fro') time4 = time.time() print("init. time: %fs" % (time1 - time0)) print("fill. time: %fs" % (time2 - time1)) print("mult. time: %fs" % (time3 - time2)) print("norm. time: %fs" % (time4 - time3)) print("TOTAL time: %fs" % (time4 - time0)) print("result : %e" % (N)) ``` ] ] -- .row[ .column.w48[ ## Result ```text ===== Numpy 3 ===== fill. time: 0.078148s *calc. time: 0.198162s total time: 0.276310s result : 6.17895105956704e+10 ``` ] .column.w48[ ## Result ```text ===== Pythran ===== init. time: 0.000120s fill. time: 1.526504s *mult. time: 113.829122s norm. time: 0.094946s TOTAL time: 115.450692s result : 6.17895105956707e+10 ``` ] ] --- # NumPy - Conclusions .noflex[ .rightf.w50[ .hcenter[] .hcenter[[source: xkcd](https://xkcd.com/353)] ] .w40[] ## Standard NumPy :+: very efficient :+: very easy to read :+: **LOTS** of specialized functions :+: can be accelerated **if needed** :warning: loops in python / NumPy are **SLOW** ## To go further :arrow_right: very dynamic field of research :arrow_right: **stay curious** :arrow_right: **try new frameworks** :arrow_right: Python bindings of Armadillo + NumPy :+: Swig + Armadillo + NumPy ] --- # Scientific programming ## Scientific programming tools presented in this course * linear algebra libraries (usable from `C++`) * `Boost`, `LAPACK`, `BLAS`, `MKL`, `GSL`, `Eigen`, **`Armadillo`** :+: * scientific programming with Python * **introduction to Python** :+: * **`NumPy`** :+: * presenting results * write beautiful math: ** .latex[L
a
T
e
X] ** :+: * structure the documentation: **Markdown** :+: * write some slides: **remark.js** :+: ## Scientific programming techniques presented in this course * numerical integration * **Gaussian quadrature** :+: * optimization of sum calculation :+: