IPS-PROD

class: titlepage

.title[
Introduction to Scientific Programming
]

.subtitle[
`IPS-PROD` - N. Dubray - ENSIIE - 2017
]

.row[
`$$\int_{-\infty}^\infty f(x)e^{-x^2} dx \simeq \sum_{i=0}^{n-1}w_if(x_i)$$`
]

.footnote[
[:book:](../toc/index.html)
]

---

layout: true
class: animated fadeIn middle numbers

.footnote[
`IPS-PROD` - N. Dubray - ENSIIE - 2017 - [:book:](../toc/index.html)
]

---

# These slides

.hcenter.w20[![](images/github-octocat.png)]

.block.hcenter[
* You can **freely** download, modify and use these slides.
* Do not hesitate to make some **pull requests** if you want to share your modifications.
* Repository: [http://pedago-etu.ensiie.fr/noel.dubray/lectures.git](http://pedago-etu.ensiie.fr/noel.dubray/lectures.git).
]

---

# Lecturer

## N. Dubray - [`noel.dubray@gmail.com`](mailto:noel.dubray@gmail.com)
* *job:* researcher at the CEA (french alternative energies and atomic energy commission)
* *field:* theoretical nuclear physics
* *research subject:* microscopic description of the fission process
* *skills:* scientist / developer / HPC user / sysadmin

.hcenter.shadow.w50[![](images/density.png)]
.hcenter[Example of a simulated fissioning nucleus]

---

# Modules

## :arrow_right: IPS: Introduction to Scientific Programming (S3)
* `IPS.DEV`: Scientific application development
* `IPS.PROD`: Scientific data production, post-processing and visualization

## :arrow_right: PSA: Advanced Scientific Programming (S4)
* `PSA.???`: TBA
* `PSA.???`: TBA

## Evaluation
* project presentation [~10min] + questions [~10min]

## TPs timeline
.mermaid[
    graph LR
    A("Week #1<br/>Initialization") --> B
    B("Week #2<br/>Project") --> C
    C("Week #3<br/>Project") --> D
    D("Week #4<br/>Project") --> E
    E("Week #5<br/>Project") --> F("Week #6<br/>Presentation")
    style F fill:#090
]

---

# Data visualization

## 1D/2D plotting tools presented in this course
* Python: **`Matplotlib`**
* Javascript: **`Plotly.js`**
* Standalone: `gnuplot`, **`xmgrace`**
 
## 3D plotting tools presented in this course
* Renderer: **`POV-Ray`** .animated.infinite.flash[:arrow_left:]
* Modeler + renderer: **`Blender`**
* Data visualization: **`Paraview`**

---

# POV-Ray

.hcenter.w40[![](images/povray_example.png)]

## History
* very old project (~1980)
* maintained by **The POV-Team**
* name means "Persistence of Vision Ray Tracer"

## Key points
* ray-tracer
* Turing-complete scene description language
* procedural textures
* radiosity

.block[
* Repository: [https://github.com/POV-Ray/povray](https://github.com/POV-Ray/povray)
* Website: [http://www.povray.org](http://www.povray.org)
* License: `AGPLv3`
]

---

# Ray-tracing

## What is ray-tracing ?
:arrow_right: a method to generate nearly photo-realistic images by modeling light rays.

## Key points
* very good reproduction of shadows, reflections, refractions, scattering, etc...
* **very high computational cost**
* can easily use **parallelization**

## Main types of ray-tracing
* eye-based (historical)
* light-based
* bidirectional path tracing
* photon mapping (for caustics, global illumination...)
* etc...

---

# POV-Ray - General

## Workflow
1. describe the scene in a file
2. let POV-Ray interpret the file and generate an image

## Object types
* POV-Ray default objects
* user-defined objects

## Object properties
* how it deflects / reflects light
* texture / material / color
* etc...

---

# POV-Ray - Example

.row[
.column.w48[
## .hcenter[`[example_00.pov]`]

```pov
#include "colors.inc"    

background { color White }

camera {
  location <0, 2, -3>
  look_at <0, 1, 2>
}

sphere {
  <0, 1, 2>, 2
  texture {
    pigment { color Red }
  }
}

light_source { <2, 4, -3> color White}

```
]
.column.w48.middle[
## .hcenter[Shell session]

```shell
$ povray +A0.0001 -W800 -H600 +P +Q11 example_00.pov
```
]
]

.hcenter.w40.border.shadow[![](images/example_00.png)]

---

# POV-Ray - Source code

## Syntax
* **very close from `C`**
* case-sensitive
* whitespaces are ignored
* **order of objects** is ignored (but not properties)
* comments use `/* ... */` and `//` like `C`
* include other files with `#include "..."`

## Vectors `<a0, ... ,an>`
* can refer to a point in 2D-space `<x,y>`
* can refer to a point in 3D-space `<x,y,z>`
* can refer to a color `rgb <r,g,b>` with `\(0\le \{r,g,b\} \le 1\)`
* can refer to a transparent color `rgbf <r,g,b,f>` with `\(0\le \{r,g,b,f\} \le 1\)`
* can refer to a normal vector `<nx,ny,nz>`

---

# POV-Ray - Initial scene

.row[
.column.w48.middle[
```pov
camera
{
  location <1,4,-5>
  look_at <0,1,0>
  angle 45
}
```
]
.column.w48[
## Camera
:arrow_right: defines the "eye" of the observer
]
]

.row[
.column.w48.middle[
```pov
light_source
{
  <-2,7,-3>
  color rgb <1,1,1>
}
```
]
.column.w48[
## Light
:arrow_right: emits some light
]
]

.row[
.column.w48.middle[
```pov
background
{
  rgb <1,1,1>
  // rgbf <1,1,1,0>
}
```
]
.column.w48[
## Background
:arrow_right: defines the default color of rays
]
]

---

# POV-Ray - Plane

.row[
.column.w40.middle[
```pov
object
{
  plane
  {
    <0,1,0> // normal axis
    0 // where does the plane cut the axis
    pigment
    {
      checker // checker pattern
      color rgb <0.7,0.7,0.8> // light gray
      color rgb <0.5,0.5,0.5> // dark gray
      scale 2
    }
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/plane.png)]
]
]


---

# POV-Ray - Sphere

.row[
.column.w40.middle[
```pov
object
{
  sphere
  {
    <0,1,0>, 1 // center, radius
  }
  pigment
  {
    color rgb <0,1,0> // pure green
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/sphere.png)]
]
]

---

# POV-Ray - Torus

.row[
.column.w40.middle[
```pov
object
{
  torus
  {
    0.8, 0.2 // big radius, small radius
  }
  pigment
  {
    color rgb <0,0,1> // pure blue
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
  rotate <90,0,0>   // rotation
  translate <0,1,0> // translation
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/torus.png)]
]
]

---

# POV-Ray - Box

.row[
.column.w40.middle[
```pov
object
{
  box
  {
    <-1,0,-1>, <1,2,1> // corners
  }
  pigment
  {
    color rgb <1,0,0> // pure red
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/box.png)]
]
]

---

# POV-Ray - Cylinder

.row[
.column.w40.middle[
```pov
object
{
  cylinder
  {
    <0,0,0> // 1st center
    <0,2,0> // 2nd center
    1       // radius
  }
  pigment
  {
    color rgb <1,1,0> // pure yellow
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/cylinder.png)]
]
]

---

# POV-Ray - Cone

.row[
.column.w40.middle[
```pov
object
{
  cone
  {
    <0,0,0> // 1st center
    1       // 1st radius
    <0,2,0> // 2nd center
    0.1     // 2nd radius
  }
  pigment
  {
    color rgb <1,0,1> // pure magenta
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/cone.png)]
]
]

---

# POV-Ray - Lathe

.row[
.column.w40.middle[
```pov
object
{
  lathe
  {
    cubic_spline // type of spline
    6  // number of points
    <1.0,0>  // -+
    <1.0,0>  //  |
    <1.0,2>  //  | spline
    <0.3,2>  //  |
    <0.3,0>  //  |
    <0.3,0>  // -+
  }
  pigment
  {
    color rgb <0,1,1> // pure cyan
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/lathe.png)]
]
]

---

# POV-Ray - Prism

.row[
.column.w40.middle[
```pov
object
{
  prism
  {
    linear_spline // type of spline
    0, 2  // origin, height
    10  // number of points
    <-1,-1>      // -+
    <-1,1>       //  |
    <1,1>        //  | outer spline
    <1,-1>       //  |
    <-1,-1>      // -+
    <-0.8,-0.8>  // -+
    <-0.8,0.8>   //  |
    <0.8,0.8>    //  | inner spline
    <0.8,-0.8>   //  |
    <-0.8,-0.8>  // -+
  }
  pigment
  {
    color rgb <1,0,0> // pure red
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/prism.png)]
]
]

---

# POV-Ray - Height field

.row[
.column.w40.middle[
```pov
#declare fn_Hills = function
{
  pigment
  {
    bumps
    warp
    {
      turbulence 0.5
    }
    scale 0.2
  }
}

object
{
  height_field
  {
    function 16, 16
    {
      fn_Hills (x, y, z).red
    }

  }
  pigment
  {
    gradient y
    color_map
    {
      [0.0 color rgb <1,1,1>]
      [0.5 color rgb <1,1,1>]
      [1.0 color rgb <1,0,0>]
    }
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
  translate <-0.5, -0.5, -0.5>
  scale <2, 0.5, 2>
  translate <0,1,0>
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/height_field_16.png)]
]
]

---

# POV-Ray - Height field

.row[
.column.w40.middle[
```pov
#declare fn_Hills = function
{
  pigment
  {
    bumps
    warp
    {
      turbulence 0.5
    }
    scale 0.2
  }
}

object
{
  height_field
  {
*   function 100, 100
    {
      fn_Hills (x, y, z).red
    }
*   smooth
  }
  pigment
  {
    gradient y
    color_map
    {
      [0.0 color rgb <1,1,1>]
      [0.5 color rgb <1,1,1>]
      [1.0 color rgb <1,0,0>]
    }
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
  translate <-0.5, -0.5, -0.5>
  scale <2, 0.5, 2>
  translate <0,1,0>
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/height_field.png)]
]
]

---

# POV-Ray - Blob

.row[
.column.w40.middle[
```pov
object
{
  blob
  {
    threshold 0.6 // isovalue
    sphere
    {
      <.75, 0, 0>  // center
      1            // radius
      1            // strength
    }
    sphere { <-.375, .64952, 0>, 1, 1 }
    sphere { <-.375, -.64952, 0>, 1, 1 }
    rotate <90,0,0>
    translate <0,1,0>
    scale 1.2
  }
  pigment
  {
    color rgb <0,1,0> // pure green
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/blob.png)]
]
]

---

# POV-Ray - Text

.row[
.column.w40.middle[
```pov
object
{
  text
  {
    ttf "timrom.ttf" "ENSIIE" // font
    0.3  // thickness
    0    // offset
    translate <-1.6,-0.35,-0.15>
    scale 1.0
    translate <0,1,0>
  }
  pigment
  {
    agate // pattern type
    color_map {
      [0.00 color rgb <1,0,0>]
      [0.25 color rgb <1,1,0>]
      [0.50 color rgb <0,1,0>]
      [0.75 color rgb <0,1,1>]
      [1.00 color rgb <0,0,1>]
    }
    scale 2.0
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
}
}```
]
.column.w55.middle[
.hcenter.shadow[![](images/text.png)]
]
]

---

# POV-Ray - Transparency

.row[
.column.w40.middle[
```pov
object
{
  text
  {
    ttf "timrom.ttf" "ENSIIE" // font
    0.3  // thickness
    0    // offset
    translate <-1.6,-0.35,-0.15>
    scale 1.0
    translate <0,1,0>
  }
  pigment
  {
*   rgbf <1,0,0,0.9>
  }
  finish
  {
    phong 1.0
    reflection 0.2
*   ambient 0.1
*   diffuse 0.5
*   specular 0.5
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/transparent.png)]
]
]

---

# POV-Ray - Union of objects

.row[
.column.w40.middle[
```pov
object
{
  union
  {
    box
    {
      <-1,0,-1>, <1,2,1>
    }
    sphere
    {
      <0,2,0>, 1.0
    }
    scale 0.7
  }
  pigment
  {
    color rgb <0,1,0> // pure green
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/union.png)]
]
]

---

# POV-Ray - Intersection of objects

.row[
.column.w40.middle[
```pov
object
{
  intersection
  {
    box
    {
      <-1,0,-1>, <1,2,1>
    }
    sphere
    {
      <0,2,0>, 1.0
    }
    scale 0.7
  }
  pigment
  {
    color rgb <0,0,1> // pure blue
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/intersection.png)]
]
]

---

# POV-Ray - Difference of objects

.row[
.column.w40.middle[
```pov
object
{
  difference
  {
    box
    {
      <-1,0,-1>, <1,2,1>
    }
    sphere
    {
      <0,2,0>, 1.0
    }
    scale 0.7
  }
  pigment
  {
    color rgb <1,0,0> // pure red
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/difference.png)]
]
]

---

# POV-Ray - Merge objects

.row[
.column.w40.middle[
```pov
object
{
  merge
  {
    box
    {
      <-1,0,-1>, <1,2,1>
    }
    sphere
    {
      <0,2,0>, 1.0
    }
    scale 0.7
  }
  pigment
  {
    color rgb <1,1,0> // pure yellow
  }
  finish
  {
    phong 1.0
    reflection 0.2
  }
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/merge.png)]
]
]

---

# POV-Ray - Union / Merge comparison

.row[
.column.w48.middle[
## .hcenter[Union]
.hcenter.shadow[![](images/transparent_union.png)]
]
.column.w48.middle[
## .hcenter[Merge]
.hcenter.shadow[![](images/transparent_merge.png)]
]
]

---

# POV-Ray - Density

.row[
.column.w40.middle[
```pov
object
{
  box
  {
    <-1,-1,-1>, <1,1,1>
*   hollow
  }
  pigment
  {
    color rgbf <1,1,1,1>
  }
  interior
  {
    media
    {
      emission 0.75
      scattering {1, 0.5}
*     density
      {
        spherical
        color_map
        {
          [0.0 rgb <0,0,0.5>]
          [0.5 rgb <0.8, 0.8, 0.4>]
          [1.0 rgb <1,1,1>]
        }
      }
    }
  }
  translate <0,1,0>
* translate <0,0.001,0>
}
```
]
.column.w55.middle[
.hcenter.shadow[![](images/density_example.png)]
]
]

---

# POV-Ray - Nuclear Local Density

.hcenter.shadow.w80[![](images/density.png)]

---

# POV-Ray - `visu.pov`

:arrow_right: file available [here](../tools/povray/visu.pov). This scene uses a density file called `example.df3`.

```pov
[...]
box
{
 <0,0,0><1,1,1>
 hollow
 pigment
 {
  rgbf 1
 }
 interior
 {
  media
  {
   emission 0.7
   absorption 1.0
   intervals 10
   density
   {
*   density_file df3 "example.df3"

    scale<NBX/(NBX-1),NBY/(NBY-1),NBZ/(NBZ-1)>
    interpolate INTERPOLATE

    color_map
    {
     [0.0   rgb <0.0,0.0,0.0>]
     [0.1   rgb <0.0,0.0,2.0>]
     [1.0   rgb <1.0,1.0,1.0>]
    } 
   }
  }
 }
 scale <10,10,20>
 translate <-5,-5,-10>
 rotate y*90
}
[...]
```

---

# POV-Ray - `df3` format

## Reference
[POV-Ray df3 format](http://povray.org/documentation/3.7.0/r3_4.html#r3_4_7_1_8_1).

.block[
## Header
The df3 format consists of a 6 byte header of three 16-bit integers with high order byte first.
These three values give the x,y,z size of the data in pixels (or more appropriately called voxels).

## Data
The header is followed by x*y*z unsigned integer bytes of data with a resolution of 8, 16 or 32 bit.
The data are written with high order byte first (big-endian). The resolution of the data is determined
by the size of the df3-file. That is, if the file is twice (minus header, of course) as long as an 8 bit
file then it is assumed to contain 16 bit ints and if it is four times as long 32 bit ints. 
]

---

# POV-Ray - `df3` writer

```C++
std::string cubeToDf3(const arma::cube &m)
{
  std::stringstream ss(std::stringstream::out | std::stringstream::binary);
  int nx = m.n_rows;
  int ny = m.n_cols;
  int nz = m.n_slices;
  ss.put(nx >> 8);
  ss.put(nx & 0xff);
  ss.put(ny >> 8);
  ss.put(ny & 0xff);
  ss.put(nz >> 8);
  ss.put(nz & 0xff);
  double theMin = 0.0;
  double theMax = m.max();

  for (uint k = 0; k < m.n_slices; k++)
  {
    for (uint j = 0; j < m.n_cols; j++)
    {
      for (uint i = 0; i < m.n_rows; i++)
      {
        uint v = 255 * (fabs(m(i, j, k)) - theMin) / (theMax - theMin);
        ss.put(v);
      }
    }
  }

  return ss.str();
}
```
where `m` is an `arma::cube` representing the density to be plotted.
**Mandatory dimensions for `m`**:

.hcenter[
| Axis |    x   |    y   |    z   |
| :--: | :----: | :----: | :----: |
| min  | -10 fm | -10 fm | -20 fm |
| max  |  10 fm |  10 fm |  20 fm |
| nbp  |  32    |  32    |  64    |
]

---

# POV-Ray - Conclusions

.hcenter.shadow.w60[![](images/landscape.png)]

:+: rather powerful ray-tracer  
:+:  very easy to generate scenes  
:+: very easy to automate mass-production of renderings  
:+: good introduction to ray-tracers  
:warning: not a modeler  
:warning: may seem old-school at some points  

:arrow_right: **My advice:** use it as your first ray-tracer, then switch to **Blender**, especially if you need a modeler.

---

# Data visualization

## 1D/2D plotting tools presented in this course
* Python: **`Matplotlib`**
* Javascript: **`Plotly.js`**
* Standalone: `gnuplot`, **`xmgrace`**
 
## 3D plotting tools presented in this course
* Renderer: **`POV-Ray`** :+:
* Modeler + renderer: **`Blender`** .animated.infinite.flash[:arrow_left:]
* Data visualization: **`Paraview`**

---

# Blender - `raw` writer

```C++
std::string cubeToRaw(const arma::cube &m)
{
  std::stringstream ss(std::stringstream::out | std::stringstream::binary);
  double theMin = 0.0;
  double theMax = m.max();

  for (uint k = 0; k < m.n_slices; k++)
  {
    for (uint j = 0; j < m.n_cols; j++)
    {
      for (uint i = 0; i < m.n_rows; i++)
      {
        uint v = 255 * (fabs(m(i, j, k)) - theMin) / (theMax - theMin);
        ss.put(v);
      }
    }
  }

  return ss.str();
}
```
where `m` is an `arma::cube` representing the density to be plotted.
**Mandatory dimensions for `m`**:

.hcenter[
| Axis |    x   |    y   |    z   |
| :--: | :----: | :----: | :----: |
| min  | -10 fm | -10 fm | -20 fm |
| max  |  10 fm |  10 fm |  20 fm |
| nbp  |  32    |  32    |  64    |
]

---

# Scientific programming

## Scientific programming tools presented in this course
 * linear algebra libraries (usable from `C++`)
  * `Boost`, `LAPACK`, `BLAS`, `MKL`, `GSL`, `Eigen`, **`Armadillo`** :+:
 * scientific programming with Python
  * **introduction to Python** :+:
  * **`NumPy`**
 * presenting results
  * write beautiful math: ** .latex[L<sup>a</sup>T<sub>e</sub>X] ** :+:
  * structure the documentation: **Markdown** :+:
  * write some slides: **remark.js** :+:

## Scientific programming techniques presented in this course
 * numerical integration
  * **Gaussian quadrature** :+:
 * optimization of sum calculation .animated.infinite.flash[:arrow_left:]

---

# Optimization of sum calculation - Sum types

## 1. Simple sum
`$$
V \equiv \sum_{i_1}^{I_1}\ldots \sum_{i_N}^{I_N} \prod_{j=1}^{M} F_j[i_1,\ldots,i_N]
$$`

:arrow_right: By convention, the `\(F_j[\ldots]\)` values are pre-calculated and stored:

`$$
\forall j \in [1..\,M], F_j[i_1,\ldots,i_N] \equiv F(x_{1,j}[i_1], \ldots, x_{N,j}[i_N])
$$`
and
`$$
\forall i \in [1..\,N], \forall j \in [1..\,M], x_{i,j} \equiv \{x_{i,j}[1],\ldots,x_{i,j}[I_i]\}.
$$`

## Example
`$$
V \equiv \sum_{i=0}^{100}\sum_{j=0}^{100} \frac{x[i]+y[j]}{\sqrt{1 + x[i] + y[j]}}
$$`

---

# Optimization of sum calculation - Sum types

## 2. "Multiple" sum
`$$
\forall k_1 \in [1..\,K_1],\ldots, \forall k_L \in [1..\,K_L], V[k_1,\ldots,k_L] \equiv \sum_{i_1}^{I_1}\ldots \sum_{i_N}^{I_N} \prod_{j=1}^{M} F_j[i_1,\ldots,i_N, k_1,\ldots, k_L].
$$`
Using the notation
`$$
\coprod_{k}^{K} \equiv \left(\forall k \in [1..\,K]\right),
$$`
one can re-write the previous expression as
`$$
\coprod_{k_1}^{K_1}\ldots \coprod_{k_L}^{K_L} V[k_1,\ldots,k_L] \equiv \sum_{i_1}^{I_1}\ldots \sum_{i_N}^{I_N} \prod_{j=1}^{M} F_j[i_1,\ldots,i_N, k_1,\ldots, k_L].
$$`

## Example
`$$
\coprod_i^{10}\coprod_j^{10} V[i,j] \equiv \sum_{k=0}^{100}\sum_{l=0}^{100} \frac{x[i]+y[j]}{\sqrt{1 + x[k] + y[l]}}
$$`

---

# Optimization of sum calculation - Sum types

## 3. "Multiple", "variable range" sum

`$$
\coprod_{k_1}^{K_1}\ldots \coprod_{k_L}^{K_L[k_1,\ldots,k_{L-1}]} V[k_1,\ldots,k_L] \equiv \sum_{i_1}^{I_1[k_1,\ldots,k_L]}\ldots \sum_{i_N}^{I_N[k_1,\ldots,k_L,i_1,\ldots,i_{N-1}]} \prod_{j=1}^{M} F_j[i_1,\ldots,i_N, k_1,\ldots, k_L].
$$`

## Example
`$$
\coprod_i^{10}\coprod_j^{i + 10} V[i,j] \equiv \sum_{k=0}^{i + 1}\sum_{l=0}^{j + k + 1} \frac{x[i]+y[j]}{\sqrt{1 + x[k] + y[l]}}
$$`

---

# Optimization of sum calculation - Sum types

## 4. "Multiple", "variable-range", "block-structured" sum

`$$
\coprod_{k_1}^{K_1}\ldots \coprod_{k_L}^{K_L[k_1,\ldots,k_{L-1}]} V[k_1,\ldots,k_L] \equiv \sum_{i_1}^{I_1[k_1,\ldots,k_L]}\ldots \sum_{i_N}^{I_N[k_1,\ldots,k_L,i_1,\ldots,i_{N-1}]} \prod_{j=1}^{M} F_j[i_1,\ldots,i_N, k_1,\ldots, k_L]
$$`
with
`$$
\prod_{j=1}^M F_j[i_1,\ldots,i_N, k_1,\ldots, k_L] = G[i_1,\ldots,i_N, k_1,\ldots, k_L] \left(\prod_{l=1}^{L} \delta_l(i_1,\ldots,i_N, k_1,\ldots, k_L)\right).
$$`

## Example
`$$
\begin{eqnarray}
\coprod_i^{10}\coprod_j^{i + 10} V[i,j] &\equiv& \sum_{k=0}^{i + 1}\sum_{l=0}^{j + k + 1} \frac{(1-\delta_{lk})\left(x[i]+y[j]\right)}{\sqrt{1 + x[k] + y[l]}}\\
                                        &=&      \sum_{k=0}^{i + 1}\sum_{\substack{l=0\\l\ne k}}^{j + k + 1} \frac{x[i]+y[j]}{\sqrt{1 + x[k] + y[l]}}
                                        \end{eqnarray}
$$`

---

# Optimization of sum calculation - Notations

* we sum up all `\(\sum\)` and `\(\coprod\)` dependencies in the upper symbol list
* we do the same for all functions

## Example

`$$
\begin{eqnarray}
\sum_{i=F(a)}^{G(b,c)} &\rightarrow& \sum_i^{(abc)}\\
\coprod_{j=H(d)}^{J(d,f)} &\rightarrow& \coprod_j^{(df)}\\
F[a,c,d] &\rightarrow& F^{(acd)}
\end{eqnarray}
$$`

---

# Optimization of sum calculation - Example

`$$
\coprod_a^A\coprod_b^B V[a,b]\equiv \sum_c^C\sum_d^{D(c)}\sum_e^E F[a,b] G[d,e] \delta_{ae}
$$`

## Raw implementation

* `\(\sum\)` and `\(\coprod\)` are replaced by `for` loops
* `\(\delta_{xy}\)` are replaced by `if` tests

```C++
//=== Version 0 ===

for (uint a = 0; a < A; a++)
{
  for (uint b = 0; b < B; b++)
  {
    double sum = 0.0;
    for (uint c = 0; c < C; c++)
    {
      for (uint d = 0; d < D[c]; d++)
      {
        for (uint e = 0; e < E; e++)
        {
          if (a != e) continue; 
          sum += F(a, b) * G(d, e);
        } // e
      } // d
    } // c
    V(a, b) = sum;
  } // b
} // a
```

---

# Optimization of sum calculation - Remove deltas

## Use `\(\delta_{xy}\)` to remove loops

`$$
\coprod_a^A\coprod_b^B V[a,b]\equiv \sum_c^C\sum_d^{D(c)}\color{magenta}{\sum_e^E} F[a,b] G[d,e] \color{magenta}{\delta_{ae}}
$$`
`$$
\iff \coprod_a\coprod_b V^{(ab)}= \sum_c\sum_d^{(c)} F^{(ab)} G^{(ad)}
$$`

.row[
.column.w48.middle[
```C++
//=== Version 0 ===

for (uint a = 0; a < A; a++)
{
  for (uint b = 0; b < B; b++)
  {
    double sum = 0.0;
    for (uint c = 0; c < C; c++)
    {
      for (uint d = 0; d < D[c]; d++)
      {
*       for (uint e = 0; e < E; e++)
        {
*         if (a != e) continue; 
          sum += F(a, b) * G(d, e);
        } // e
      } // d
    } // c
    V(a, b) = sum;
  } // b
} // a
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
//=== Version 1 ===

for (uint a = 0; a < A; a++)
{
  for (uint b = 0; b < B; b++)
  {
    double sum = 0.0;
    for (uint c = 0; c < C; c++)
    {
      for (uint d = 0; d < D(c); d++)
      {



        sum += F(a, b) * G(d, a);

      } // d
    } // c
    V(a, b) = sum;
  } // b
} // a
```
]
]

---

# Optimization of sum calculation - Loop order

:arrow_right: Move loops **as deep (right) as possible** (except in this case, see later)
`$$
\coprod_a\coprod_b  V^{(ab)} = \color{magenta}{\sum_c}\sum_d^{(c)} F^{(ab)} G^{(ad)}
$$`
`$$
\iff \color{magenta}{V=0}, \color{magenta}{\coprod_c}\coprod_a\coprod_b V^{(ab)} \color{magenta}{\mathrel{+}=} \sum_d^{(c)} F^{(ab)} G^{(ad)}
$$`

.row[
.column.w48.middle[
```C++
//=== Version 1 ===

for (uint a = 0; a < A; a++)
{
  for (uint b = 0; b < B; b++)
  {
    double sum = 0.0;

*   for (uint c = 0; c < C; c++)
    {
      for (uint d = 0; d < D(c); d++)
      {
        sum += F(a, b) * G(d, a);
      } // d
    } // c

    V(a, b) = sum;
  } // b
} // a
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
//=== Version 2 ===

*for (uint a = 0; a < A; a++)
* for (uint b = 0; b < B; b++)
*   V(a, b) = 0.0;

*for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
    for (uint b = 0; b < B; b++)
    {
      for (uint d = 0; d < D(c); d++)
      {
*       V(a, b) += F(a, b) * G(d, a);
      } // d
    } // b
  } // a
} // c
```
]
]

---

# Optimization of sum calculation - Function order

:arrow_right: Move functions **as high (left) as possible**
`$$
V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= \sum_d^{(c)} \color{magenta}{F^{(ab)}} G^{(ad)}
$$`

`$$
\iff V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= \color{magenta}{F^{(ab)}} \sum_d^{(c)}  G^{(ad)}
$$`

.row[
.column.w48.middle[
```C++
//=== Version 2 ===

for (uint a = 0; a < A; a++)
  for (uint b = 0; b < B; b++)
    V(a, b) = 0.0;

for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
    for (uint b = 0; b < B; b++)
    {
*     for (uint d = 0; d < D(c); d++)
*     {
*       V(a, b) += F(a, b) * G(d, a);
*     } // d
    } // b
  } // a
} // c
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
//=== Version 3 ===

for (uint a = 0; a < A; a++)
 for (uint b = 0; b < B; b++)
   V(a, b) = 0.0;

for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
    for (uint b = 0; b < B; b++)
    {
*     double sum = 0.0;
*     for (uint d = 0; d < D(c); d++)
*       sum += G(d, a);
*     V(a, b) += sum * F(a, b);
    } // b
  } // a
} // c
```
]
]

---

# Optimization of sum calculation - Pre-sum

:arrow_right: Identify pre-summable quantities

`$$
V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} \underbrace{\color{magenta}{\sum_d^{(c)}  G^{(ad)}}}_{H^{(ac)}}
$$`
`$$
\iff V=0, \coprod_c\coprod_a\left[\color{magenta}{H^{(ac)}=\sum_d^{(c)}  G^{(ad)}}\right]\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} \color{magenta}{H^{(ac)}}
$$`

.row[
.column.w48.middle[
```C++
//=== Version 3 ===

for (uint a = 0; a < A; a++)
  for (uint b = 0; b < B; b++)
    V(a, b) = 0.0;

for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
*   for (uint b = 0; b < B; b++)
*   {
*     double sum = 0.0;
*     for (uint d = 0; d < D(c); d++)
*       sum += G(d, a);
*     V(a, b) += sum * F(a, b);
*   } // b
  } // a
} // c
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
//=== Version 4 ===

for (uint a = 0; a < A; a++)
  for (uint b = 0; b < B; b++)
    V(a, b) = 0.0;
    
for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
*   double Hac = 0.0;

*   for (uint d = 0; d < D[c]; d++)
*     Hac += G(d, a);

*   for (uint b = 0; b < B; b++)
*     V(a, b) += F(a, b) * Hac;
  } // a
} // c
```
]
]

---

# Optimization of sum calculation - Vectorization

:arrow_right: Identify vector, matrix and cube operations

`$$
\color{magenta}{V=0}, \coprod_c\coprod_a\left[\color{magenta}{H^{(ac)}=\sum_d^{(c)}  G^{(ad)}}\right]\color{magenta}{\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} H^{(ac)}}
$$`

.row[
 .column.w48.middle[
```C++
//=== Version 4 ===

*for (uint a = 0; a < A; a++)
* for (uint b = 0; b < B; b++)
*   V(a, b) = 0.0;
    
for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
*   double Hac = 0.0;
*
*   for (uint d = 0; d < D[c]; d++)
*     Hac += G(d, a);

*   for (uint b = 0; b < B; b++)
*     V(a, b) += F(a, b) * Hac;
  } // a
} // c
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
//=== Version 5 ===

*V = arma::zeros(A, B);



for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
*   double Hac =
*     arma::sum(G.col(a).rows(0, D[c] - 1));



*   V.row(a) += F.row(a) * Hac; 

  } // a
} // c
```
]]

---

# Optimization of sum calculation - Speed-up

`$$
\coprod_a^A\coprod_b^B V[a,b]\equiv \sum_c^C\sum_d^{D(c)}\sum_e^E F[a,b] G[d,e] \delta_{ae}
$$`

## Initialization
```C++
//=== Common initialization ===

uint A = 109;
uint B = 119;
uint C = 129;
arma::ivec D(C);

for (uint c = 0; c < C; c++)
  D(c) = c + 1;

uint E = A;

arma::mat F;
F.randu(A, B);

arma::mat G;
G.randu(D.max(), E);

arma::mat V;
```

---

# Optimization of sum calculation - Initial execution time

`$$
\coprod_a^A\coprod_b^B V[a,b]\equiv \sum_c^C\sum_d^{D(c)}\sum_e^E F[a,b] G[d,e] \delta_{ae}
$$`

## Raw implementation

* `\(\sum\)` and `\(\coprod\)` are replaced by `for` loops
* `\(\delta_{xy}\)` are replaced by `if` tests

```C++
//=== Version 0 ===

for (uint a = 0; a < A; a++)
{
  for (uint b = 0; b < B; b++)
  {
    double sum = 0.0;
    for (uint c = 0; c < C; c++)
    {
      for (uint d = 0; d < D[c]; d++)
      {
        for (uint e = 0; e < E; e++)
        {
          if (a != e) continue; 
          sum += F(a, b) * G(d, e);
        } // e
      } // d
    } // c
    V(a, b) = sum;
  } // b
} // a
```
.alert.hcenter[**4.7808s**]

---

# Optimization of sum calculation - Remove deltas

## Use `\(\delta_{xy}\)` to remove loops

`$$
\coprod_a^A\coprod_b^B V[a,b]\equiv \sum_c^C\sum_d^{D(c)}\color{magenta}{\sum_e^E} F[a,b] G[d,e] \color{magenta}{\delta_{ae}}
$$`
`$$
\iff \coprod_a\coprod_b V^{(ab)}= \sum_c\sum_d^{(c)} F^{(ab)} G^{(ad)}
$$`

.row[
.column.w48.middle[
```C++
//=== Version 0 ===

for (uint a = 0; a < A; a++)
{
  for (uint b = 0; b < B; b++)
  {
    double sum = 0.0;
    for (uint c = 0; c < C; c++)
    {
      for (uint d = 0; d < D[c]; d++)
      {
*       for (uint e = 0; e < E; e++)
        {
*         if (a != e) continue; 
          sum += F(a, b) * G(d, e);
        } // e
      } // d
    } // c
    V(a, b) = sum;
  } // b
} // a
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
//=== Version 1 ===

for (uint a = 0; a < A; a++)
{
  for (uint b = 0; b < B; b++)
  {
    double sum = 0.0;
    for (uint c = 0; c < C; c++)
    {
      for (uint d = 0; d < D(c); d++)
      {



        sum += F(a, b) * G(d, a);

      } // d
    } // c
    V(a, b) = sum;
  } // b
} // a
```
]
]
.row[
.column.w48.middle[
.alert.hcenter[**4.7808s**]
]
.column.w48.middle[
.alert.hcenter[**0.6951s** (speed-up 6.88)]
]
]

---

# Optimization of sum calculation - Loop and function order

:arrow_right: Move loops **as deep (right) as possible** (except in this case, see later)
`$$
\coprod_a\coprod_b  V^{(ab)} = \color{magenta}{\sum_c}\sum_d^{(c)} F^{(ab)} G^{(ad)}
$$`
`$$
\iff \color{magenta}{V=0}, \color{magenta}{\coprod_c}\coprod_a\coprod_b V^{(ab)} \color{magenta}{\mathrel{+}=} \sum_d^{(c)} F^{(ab)} G^{(ad)}
$$`

.row[
.column.w48.middle[
```C++
//=== Version 1 ===

for (uint a = 0; a < A; a++)
{
  for (uint b = 0; b < B; b++)
  {
    double sum = 0.0;

*   for (uint c = 0; c < C; c++)
    {
      for (uint d = 0; d < D(c); d++)
      {
        sum += F(a, b) * G(d, a);
      } // d
    } // c

    V(a, b) = sum;
  } // b
} // a
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
//=== Version 2 ===

*for (uint a = 0; a < A; a++)
* for (uint b = 0; b < B; b++)
*   V(a, b) = 0.0;

*for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
    for (uint b = 0; b < B; b++)
    {
      for (uint d = 0; d < D(c); d++)
      {
*       V(a, b) += F(a, b) * G(d, a);
      } // d
    } // b
  } // a
} // c
```
]
]
.row[
.column.w48.middle[
.alert.hcenter[**0.6951s**]
]
.column.w48.middle[
.alert.hcenter[**0.4261s** (speed-up 1.63)]
]
]

---

# Optimization of sum calculation - Loop and function order

:arrow_right: Move functions **as high (left) as possible**
`$$
V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= \sum_d^{(c)} \color{magenta}{F^{(ab)}} G^{(ad)}
$$`

`$$
\iff V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= \color{magenta}{F^{(ab)}} \sum_d^{(c)}  G^{(ad)}
$$`

.row[
.column.w48.middle[
```C++
//=== Version 2 ===

for (uint a = 0; a < A; a++)
  for (uint b = 0; b < B; b++)
    V(a, b) = 0.0;

for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
    for (uint b = 0; b < B; b++)
    {
*     for (uint d = 0; d < D(c); d++)
*     {
*       V(a, b) += F(a, b) * G(d, a);
*     } // d
    } // b
  } // a
} // c
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
//=== Version 3 ===

for (uint a = 0; a < A; a++)
 for (uint b = 0; b < B; b++)
   V(a, b) = 0.0;

for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
    for (uint b = 0; b < B; b++)
    {
*     double sum = 0.0;
*     for (uint d = 0; d < D(c); d++)
*       sum += G(d, a);
*     V(a, b) += sum * F(a, b);
    } // b
  } // a
} // c
```
]
]
.row[
.column.w48.middle[
.alert.hcenter[**0.4261s**]
]
.column.w48.middle[
.alert.hcenter[**0.1447s** (speed-up 2.94)]
]
]

---

# Optimization of sum calculation - Pre-sum

:arrow_right: Identify pre-summable quantities

`$$
V=0, \coprod_c\coprod_a\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} \underbrace{\color{magenta}{\sum_d^{(c)}  G^{(ad)}}}_{H^{(ac)}}
$$`
`$$
\iff V=0, \coprod_c\coprod_a\left[\color{magenta}{H^{(ac)}=\sum_d^{(c)}  G^{(ad)}}\right]\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} \color{magenta}{H^{(ac)}}
$$`

.row[
.column.w48.middle[
```C++
//=== Version 3 ===

for (uint a = 0; a < A; a++)
  for (uint b = 0; b < B; b++)
    V(a, b) = 0.0;

for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
*   for (uint b = 0; b < B; b++)
*   {
*     double sum = 0.0;
*     for (uint d = 0; d < D(c); d++)
*       sum += G(d, a);
*     V(a, b) += sum * F(a, b);
*   } // b
  } // a
} // c
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
//=== Version 4 ===

for (uint a = 0; a < A; a++)
  for (uint b = 0; b < B; b++)
    V(a, b) = 0.0;
    
for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
*   double Hac = 0.0;

*   for (uint d = 0; d < D[c]; d++)
*     Hac += G(d, a);

*   for (uint b = 0; b < B; b++)
*     V(a, b) += F(a, b) * Hac;
  } // a
} // c
```
]
]
.row[
.column.w48.middle[
.alert.hcenter[**0.1447s**]
]
.column.w48.middle[
.alert.hcenter[**0.0027s** (speed-up 53.59)]
]
]

---

# Optimization of sum calculation - Vectorization

:arrow_right: Identify vector, matrix and cube operations

`$$
\color{magenta}{V=0}, \coprod_c\coprod_a\left[\color{magenta}{H^{(ac)}=\sum_d^{(c)}  G^{(ad)}}\right]\color{magenta}{\coprod_b V^{(ab)} \mathrel{+}= F^{(ab)} H^{(ac)}}
$$`

.row[
 .column.w48.middle[
```C++
//=== Version 4 ===

*for (uint a = 0; a < A; a++)
* for (uint b = 0; b < B; b++)
*   V(a, b) = 0.0;
    
for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
*   double Hac = 0.0;
*
*   for (uint d = 0; d < D[c]; d++)
*     Hac += G(d, a);

*   for (uint b = 0; b < B; b++)
*     V(a, b) += F(a, b) * Hac;
  } // a
} // c
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
//=== Version 5 ===

*V = arma::zeros(A, B);

for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
*   double Hac =
*     arma::sum(G.col(a).rows(0, D[c] - 1));

*   V.row(a) += F.row(a) * Hac; 
  } // a
} // c
```
]]
.row[
.column.w48.middle[
.alert.hcenter[**0.0027s**]
]
.column.w48.middle[
.alert.hcenter[**0.0019s** (speed-up 1.42)]
]
]

---

# Optimization of sum calculation - Overall speed-up

:arrow_right: Best speed-ups are achieved when **reducing the height of the loop stack**.  
:arrow_right: There is an important **readability / performance trade-off**.

.row[
.column.w48.middle[
```C++
//=== Version 0 ===

for (uint a = 0; a < A; a++)
{
  for (uint b = 0; b < B; b++)
  {
    double sum = 0.0;
    for (uint c = 0; c < C; c++)
    {
      for (uint d = 0; d < D[c]; d++)
      {
        for (uint e = 0; e < E; e++)
        {
          if (a != e) continue; 
          sum += F(a, b) * G(d, e);
        } // e
      } // d
    } // c
    V(a, b) = sum;
  } // b
} // a
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
//=== Version 5 ===

V = arma::zeros(A, B);

for (uint c = 0; c < C; c++)
{
  for (uint a = 0; a < A; a++)
  {
    double Hac =
      arma::sum(G.col(a).rows(0, D[c] - 1));

    V.row(a) += F.row(a) * Hac; 
  } // a
} // c
```
]
]

.row[
.column.w48.middle[
.alert.hcenter[**4.7808s**]
]
.column.w48.middle[
.alert.hcenter[**0.0019s** (speed-up 2516.21)]
]
]

---

# Optimization of sum calculation - Other techniques

## Linearization

:arrow_right: Transform multiple simple loops into a single loop.  
:arrow_right: Use a **simple formula** to calculate the indices.

.row[
.column.w48.middle[
```C++
double sum = 0.0;

for (uint a = 0; a < A; a++)
for (uint b = 0; b < B; b++)
{
  sum += F(a, b); 
}
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
double sum = 0.0;

for (uint i = 0; i < A * B; i++)
{
  uint a = i / B;
  uint b = i - a * B;
  sum += F(a, b); 
}
```
]
]

---

# Optimization of sum calculation - Other techniques

## Two-pass linearization

.vspace[]

.block.hcenter[
.hcenter[When a simple formula to calculate the indices does not exist,]
.hcenter[:arrow_right: **store the indices**.]
]

.vspace[]

.row[
.column.w48.middle[
```C++
double sum = 0.0;
for (uint a = A0;          a < A1;          a++)
for (uint b = B0(a);       b < B1(a);       b++)
for (uint c = C0(a, b);    c < C1(a, b);    c++)
for (uint d = D0(a, b, c); d < D1(a, b, c); d++)
{
  sum += F(a, b, c) * G(b, c, d); 
}
```
]
.column.middle[
:arrow_right:
]
.column.w48.middle[
```C++
arma::imat ind(4,0);
for (uint a = A0;          a < A1;          a++)
for (uint b = B0(a);       b < B1(a);       b++)
for (uint c = C0(a, b);    c < C1(a, b);    c++)
for (uint d = D0(a, b, c); d < D1(a, b, c); d++)
{
  arma::ivec v = {a, b, c, d};
  ind.insert_cols(ind.n_cols, v);
}

double sum = 0.0;
for (uint i = 0; i < ind.n_cols; i++)
  sum += F(ind(0, i), ind(1, i), ind(2, i))
       * G(ind(1, i), ind(2, i), ind(3, i)); 
```
]
]

## Three-pass

:arrow_right: The same with a fixed size for `ind`, to avoid the costly `.insert_cols`.

## :warning: Caution with multi-pass techniques
:warning: This can be useful for the **small**, **deep** loops only. No need to linearize all loops.  
:arrow_right: To be efficient, it must **factorize some calculations**.

---

# Optimization of sum calculation - Conclusions

.noflex[
## Remember these points:
* push the heaviest loops as deep as possible
* factorize functions when possible
* **pre-sum** everything ! (if possible)
* **vectorize** ! (Armadillo, special `asm` instructions, etc...)
* be smart with the indices (pick-up table, linearization, two-pass, etc...)

:arrow_right: One can do **much better** than the `-O3` flag of the compiler.

.rightf.shadow.w35[
 ![](images/cpu_cache.png)
]

## To go further...

* check the data structure alignment
* explore hardware cache levels optimization
* look up "cache invalidation"
* use multi-threading
* use multi-processing
* use GPUs
* use `asm`
* ...
]

---

# Scientific programming

## Scientific programming tools presented in this course
 * linear algebra libraries (usable from `C++`)
  * `Boost`, `LAPACK`, `BLAS`, `MKL`, `GSL`, `Eigen`, **`Armadillo`** :+:
 * scientific programming with Python
  * **introduction to Python** :+:
  * **`NumPy`**
 * presenting results
  * write beautiful math: ** .latex[L<sup>a</sup>T<sub>e</sub>X] ** :+:
  * structure the documentation: **Markdown** :+:
  * write some slides: **remark.js** :+:

## Scientific programming techniques presented in this course
 * numerical integration
  * **Gaussian quadrature** :+:
 * optimization of sum calculation  :+:

---

# Valgrind

## Key points

* `Valgrind` is a **virtual machine**
* it uses JIT compilation techniques
* it allows to easily instrument binary code (with or without debug symbols)
* there is no need to source access nor recompilation
* it provides different tools for binary code profiling
* it is language-agnostic
* it has a **huge impact** on execution times (x20)

## Tools provided
* `Memcheck`: default tool, checks validity, adressability, memory leaks...
* `None`: get the traceback of a segfault
* `Helrind`: detects race conditions in multithreaded code
* `Callgrind`: logs and profiles the function call tree
* `Cachegrind`: profiles cache use
* `exp-sgcheck`, `exp-dhat`, `exp-bbv`, etc...

.block[
* Repository: [https://sourceware.org/git/?p=valgrind.git](https://sourceware.org/git/?p=valgrind.git)
* Website: [http://valgrind.org](http://valgrind.org)
* License: `GPLv2`
]

---

# Valgrind - Memcheck example

## .hcenter[test.c]
```C
#include "stdlib.h"
#include "stdio.h"

#define SIZE 3

int main(int argc, char **argv)
{
  int* table = (int*)malloc(sizeof(int) * SIZE);

  for (int i = 0; i < SIZE; i++)
  {
    table[i] = i;
  }

  for (int i = 0; i <= SIZE; i++)
  {
    printf("table[%d]=%d\n", i, table[i]);
  }

  return 0;
}
```

## .hcenter[Shell session]
```shell
$ gcc test.c -o test
$ ./test
table[0]=0
table[1]=1
table[2]=2
table[3]=0
```

---

# Valgrind - Memcheck example

## .hcenter[Shell session]
```shell
$ valgrind ./test
==6434== Memcheck, a memory error detector
==6434== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6434== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6434== Command: ./test
==6434== 
*table[0]=0
*table[1]=1
*table[2]=2
==6434== Invalid read of size 4
==6434==    at 0x108756: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test)
==6434==  Address 0x520104c is 0 bytes after a block of size 12 alloc'd
==6434==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6434==    by 0x108708: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test)
==6434== 
*table[3]=0
==6434== 
==6434== HEAP SUMMARY:
==6434==     in use at exit: 12 bytes in 1 blocks
==6434==   total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated
==6434== 
==6434== LEAK SUMMARY:
==6434==    definitely lost: 12 bytes in 1 blocks
==6434==    indirectly lost: 0 bytes in 0 blocks
==6434==      possibly lost: 0 bytes in 0 blocks
==6434==    still reachable: 0 bytes in 0 blocks
==6434==         suppressed: 0 bytes in 0 blocks
==6434== Rerun with --leak-check=full to see details of leaked memory
==6434== 
==6434== For counts of detected and suppressed errors, rerun with: -v
==6434== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```

---

# Valgrind - Memcheck example

## .hcenter[Shell session]
```shell
$ valgrind ./test
==6434== Memcheck, a memory error detector
==6434== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6434== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6434== Command: ./test
==6434== 
table[0]=0
table[1]=1
table[2]=2
==6434== Invalid read of size 4
==6434==    at 0x108756: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test)
==6434==  Address 0x520104c is 0 bytes after a block of size 12 alloc'd
==6434==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6434==    by 0x108708: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test)
==6434== 
table[3]=0
==6434== 
==6434== HEAP SUMMARY:
==6434==     in use at exit: 12 bytes in 1 blocks
==6434==   total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated
==6434== 
==6434== LEAK SUMMARY:
==6434==    definitely lost: 12 bytes in 1 blocks
==6434==    indirectly lost: 0 bytes in 0 blocks
==6434==      possibly lost: 0 bytes in 0 blocks
==6434==    still reachable: 0 bytes in 0 blocks
==6434==         suppressed: 0 bytes in 0 blocks
==6434== Rerun with --leak-check=full to see details of leaked memory
==6434== 
==6434== For counts of detected and suppressed errors, rerun with: -v
*==6434== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```

---

# Valgrind - Memcheck example

## .hcenter[Shell session]
```shell
$ valgrind ./test
==6434== Memcheck, a memory error detector
==6434== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6434== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6434== Command: ./test
==6434== 
table[0]=0
table[1]=1
table[2]=2
*==6434== Invalid read of size 4
*==6434==    at 0x108756: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test)
*==6434==  Address 0x520104c is 0 bytes after a block of size 12 alloc'd
*==6434==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
*==6434==    by 0x108708: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test)
==6434== 
table[3]=0
==6434== 
==6434== HEAP SUMMARY:
==6434==     in use at exit: 12 bytes in 1 blocks
==6434==   total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated
==6434== 
==6434== LEAK SUMMARY:
==6434==    definitely lost: 12 bytes in 1 blocks
==6434==    indirectly lost: 0 bytes in 0 blocks
==6434==      possibly lost: 0 bytes in 0 blocks
==6434==    still reachable: 0 bytes in 0 blocks
==6434==         suppressed: 0 bytes in 0 blocks
==6434== Rerun with --leak-check=full to see details of leaked memory
==6434== 
==6434== For counts of detected and suppressed errors, rerun with: -v
==6434== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```

---

# Valgrind - Memcheck example

## .hcenter[Shell session]
```shell
$ valgrind ./test
==6434== Memcheck, a memory error detector
==6434== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6434== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6434== Command: ./test
==6434== 
table[0]=0
table[1]=1
table[2]=2
==6434== Invalid read of size 4
==6434==    at 0x108756: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test)
==6434==  Address 0x520104c is 0 bytes after a block of size 12 alloc'd
==6434==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6434==    by 0x108708: main (in /home/dubrayn/temp/ensiie/lectures/tools/gdb/test)
==6434== 
table[3]=0
==6434== 
==6434== HEAP SUMMARY:
*==6434==     in use at exit: 12 bytes in 1 blocks
==6434==   total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated
==6434== 
==6434== LEAK SUMMARY:
*==6434==    definitely lost: 12 bytes in 1 blocks
==6434==    indirectly lost: 0 bytes in 0 blocks
==6434==      possibly lost: 0 bytes in 0 blocks
==6434==    still reachable: 0 bytes in 0 blocks
==6434==         suppressed: 0 bytes in 0 blocks
==6434== Rerun with --leak-check=full to see details of leaked memory
==6434== 
==6434== For counts of detected and suppressed errors, rerun with: -v
==6434== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```

---

# Valgrind - Memcheck example

## .hcenter[test.c]
```C
#include "stdlib.h"
#include "stdio.h"

#define SIZE 3

int main(int argc, char **argv)
{
  int* table = (int*)malloc(sizeof(int) * SIZE);

  for (int i = 0; i < SIZE; i++)
  {
    table[i] = i;
  }

  for (int i = 0; i <= SIZE; i++)
  {
    printf("table[%d]=%d\n", i, table[i]);
  }

  return 0;
}
```

## .hcenter[Shell session]
```shell
*$ gcc -g test.c -o test
$ ./test
table[0]=0
table[1]=1
table[2]=2
table[3]=0
```

---

# Valgrind - Memcheck example

## .hcenter[Shell session]

```shell
$ valgrind ./test
=6522== Memcheck, a memory error detector
==6522== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6522== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6522== Command: ./test
==6522== 
table[0]=0
table[1]=1
table[2]=2
==6522== Invalid read of size 4
*==6522==    at 0x108756: main (test.c:17)
==6522==  Address 0x520104c is 0 bytes after a block of size 12 alloc'd
==6522==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6522==    by 0x108708: main (test.c:8)
==6522== 
table[3]=0
==6522== 
==6522== HEAP SUMMARY:
==6522==     in use at exit: 12 bytes in 1 blocks
==6522==   total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated
==6522== 
==6522== LEAK SUMMARY:
==6522==    definitely lost: 12 bytes in 1 blocks
==6522==    indirectly lost: 0 bytes in 0 blocks
==6522==      possibly lost: 0 bytes in 0 blocks
==6522==    still reachable: 0 bytes in 0 blocks
==6522==         suppressed: 0 bytes in 0 blocks
==6522== Rerun with --leak-check=full to see details of leaked memory
==6522== 
==6522== For counts of detected and suppressed errors, rerun with: -v
==6522== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```

---

# Valgrind - Memcheck example

## .hcenter[test.c]
```C
#include "stdlib.h"
#include "stdio.h"

#define SIZE 3

int main(int argc, char **argv)
{
  int* table = (int*)malloc(sizeof(int) * SIZE);

  for (int i = 0; i < SIZE; i++)
  {
    table[i] = i;
  }

  for (int i = 0; i <= SIZE; i++)
  {
*   printf("table[%d]=%d\n", i, table[i]);
  }

  return 0;
}
```

## .hcenter[Shell session]
```shell
$ gcc -g test.c -o test
$ ./test
table[0]=0
table[1]=1
table[2]=2
table[3]=0
```

---

# Valgrind - Memcheck example

## .hcenter[Shell session]

```shell
$ valgrind ./test
=6522== Memcheck, a memory error detector
==6522== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6522== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6522== Command: ./test
==6522== 
table[0]=0
table[1]=1
table[2]=2
==6522== Invalid read of size 4
==6522==    at 0x108756: main (test.c:17)
==6522==  Address 0x520104c is 0 bytes after a block of size 12 alloc'd
==6522==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6522==    by 0x108708: main (test.c:8)
==6522== 
table[3]=0
==6522== 
==6522== HEAP SUMMARY:
==6522==     in use at exit: 12 bytes in 1 blocks
==6522==   total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated
==6522== 
==6522== LEAK SUMMARY:
==6522==    definitely lost: 12 bytes in 1 blocks
==6522==    indirectly lost: 0 bytes in 0 blocks
==6522==      possibly lost: 0 bytes in 0 blocks
==6522==    still reachable: 0 bytes in 0 blocks
==6522==         suppressed: 0 bytes in 0 blocks
*==6522== Rerun with --leak-check=full to see details of leaked memory
==6522== 
==6522== For counts of detected and suppressed errors, rerun with: -v
==6522== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```

---

# Valgrind - Memcheck example

## .hcenter[Shell session]

```shell
*$ valgrind --leak-check=full ./test
==6554== Memcheck, a memory error detector
==6554== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6554== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6554== Command: ./test
==6554== 
table[0]=0
table[1]=1
table[2]=2
==6554== Invalid read of size 4
==6554==    at 0x108756: main (test.c:17)
==6554==  Address 0x520104c is 0 bytes after a block of size 12 alloc'd
==6554==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6554==    by 0x108708: main (test.c:8)
==6554== 
table[3]=0
==6554== 
==6554== HEAP SUMMARY:
==6554==     in use at exit: 12 bytes in 1 blocks
==6554==   total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated
==6554== 
*==6554== 12 bytes in 1 blocks are definitely lost in loss record 1 of 1
*==6554==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
*==6554==    by 0x108708: main (test.c:8)
==6554== 
==6554== LEAK SUMMARY:
==6554==    definitely lost: 12 bytes in 1 blocks
==6554==    indirectly lost: 0 bytes in 0 blocks
==6554==      possibly lost: 0 bytes in 0 blocks
==6554==    still reachable: 0 bytes in 0 blocks
==6554==         suppressed: 0 bytes in 0 blocks
==6554== 
==6554== For counts of detected and suppressed errors, rerun with: -v
==6554== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
```

---

# Valgrind - Memcheck example

## .hcenter[test.c]
```C
#include "stdlib.h"
#include "stdio.h"

#define SIZE 3

int main(int argc, char **argv)
{
* int* table = (int*)malloc(sizeof(int) * SIZE);

  for (int i = 0; i < SIZE; i++)
  {
    table[i] = i;
  }

  for (int i = 0; i <= SIZE; i++)
  {
    printf("table[%d]=%d\n", i, table[i]);
  }

  return 0;
}
```

## .hcenter[Shell session]
```shell
$ gcc -g test.c -o test
$ ./test
table[0]=0
table[1]=1
table[2]=2
table[3]=0
```

---

# Valgrind - Memcheck example

## .hcenter[test.c]
```C
#include "stdlib.h"
#include "stdio.h"

#define SIZE 3

int main(int argc, char **argv)
{
  int* table = (int*)malloc(sizeof(int) * SIZE);

  for (int i = 0; i < SIZE; i++)
  {
    table[i] = i;
  }

* for (int i = 0; i < SIZE; i++)
  {
    printf("table[%d]=%d\n", i, table[i]);
  }

* free(table);

  return 0;
}
```

## .hcenter[Shell session]
```shell
$ gcc -g test.c -o test
$ ./test
table[0]=0
table[1]=1
table[2]=2
```

---

# Valgrind - Memcheck example

## .hcenter[Shell session]

```shell
$ valgrind ./test
==6614== Memcheck, a memory error detector
==6614== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6614== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6614== Command: ./test
==6614== 
table[0]=0
table[1]=1
table[2]=2
==6614== 
==6614== HEAP SUMMARY:
*==6614==     in use at exit: 0 bytes in 0 blocks
==6614==   total heap usage: 2 allocs, 2 frees, 1,036 bytes allocated
==6614== 
*==6614== All heap blocks were freed -- no leaks are possible
==6614== 
==6614== For counts of detected and suppressed errors, rerun with: -v
*==6614== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
```

## Other detectable bugs
* use of uninitialized values (with `--track-origins=yes`)
* declaration of unused variables
* invalid write
* use of free'd resource

---

# Valgrind - Conclusions

:+: can easily locate some common (but specific) bugs  
:+: no need to recompile  
:+: no need to have source access  
:+: can be used everywhere (CLI)  
:warning: can not detect everything (static array overflow...)  
:warning: huge impact on performances  
:warning: slightly verbose ?  

:arrow_right: **My advice:** Strange bug ? Segfault ? Unexpected behavior ? **Valgrind !!!** No excuse.  

.hcenter.w90[![](images/xkcd_722.png)]
.hcenter[[source: xkcd](https://xkcd.com/722)]

:arrow_right: Nothing visible with **Valgrind** ? Time to move to... **GDB** !

---

# GNU Project Debugger

.hcenter.shadow.w20[![](images/stallman.png)]

## Key points

* written by **Richard Stallman** in 1986
* official debugger of the GNU system
* **best symbolic debugger ever**
* learn `gdb`, use every other debugger

.block[
* Repository: [https://sourceware.org/git/?p=binutils-gdb.git](https://sourceware.org/git/?p=binutils-gdb.git)
* Website: [https://www.gnu.org/software/gdb](https://www.gnu.org/software/gdb)
* License: `GPLv3`
]

---

# GNU Project Debugger - Introduction

## What is a **good** symbolic debugger ?
* a program to run and check the execution of another program
* you can stop the program execution and explore its full state (memory, registers, call stack, threads, etc...)
* you can **MODIFY** the program state at any time
* you can **call functions** from your program scope at any time
* you can **go back in time** in the execution flow
* you can **see what is actually executed**
  
## Some problems that can be solved with a debugger
* infinite loop ? see why the loop is not finishing
* crash ? see what happened
* a line is not executing ? see the program flow
* a variable has not the expected value ? see how it is calculated
* etc...

---

class: top

# GNU Project Debugger - Introduction

.hcenter.shadow.w40[
![](images/dog.png)
]

## Why not use `printf` ?

--
 * to avoid code pollution

--
 * a debugger can do **much more** than `printf`

--
 * you can attach `gdb` to a **running** program !

--
 * you can debug post-mortem without restarting

--
 * you will learn from the lower layers (memory, stack, registers, etc...)

--
 * with `gdb`, no need to recompile and re-run

---

# GNU Project Debugger - Compilation

:arrow_right: Compile to binary objects with the `-g` flag.
* this will add **debugging symbols** to your binary objects

## .hcenter[test.c]
```C
#include "stdlib.h"
#include "stdio.h"

#define SIZE 3

int main(int argc, char **argv)
{
  int* table = (int*)malloc(sizeof(int) * SIZE);

  for (int i = 0; i < SIZE; i++)
  {
    table[i] = i;
  }

  for (int i = 0; i < SIZE; i++)
  {
    printf("table[%d]=%d\n", i, table[i]);
  }

  free(table);

  return 0;
}
```

---

# GNU Project Debugger - Debugging symbols

.row[
.column.w48[

## .hcenter[Shell session]
```shell
$ gcc -c test.c
$ nm -a test.o
0000000000000000 b .bss
0000000000000000 n .comment
0000000000000000 d .data
0000000000000000 r .eh_frame
                 U free
                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 T main
                 U malloc
0000000000000000 n .note.GNU-stack
                 U printf
0000000000000000 r .rodata
0000000000000000 a test.c
0000000000000000 t .text
```
]

.column.w48[

## .hcenter[Shell session]
```shell
$ gcc -c -g test.c
$ nm -a test.o
0000000000000000 b .bss
0000000000000000 n .comment
0000000000000000 d .data
*0000000000000000 N .debug_abbrev
*0000000000000000 N .debug_aranges
*0000000000000000 N .debug_info
*0000000000000000 N .debug_line
*0000000000000000 N .debug_str
0000000000000000 r .eh_frame
                 U free
                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 T main
                 U malloc
0000000000000000 n .note.GNU-stack
                 U printf
0000000000000000 r .rodata
0000000000000000 a test.c
0000000000000000 t .text
```
]
]

---

# GNU Project Debugger - Loading a binary

## .hcenter[Shell session]
```shell
$ gcc -c -g test.c
$ gcc test.o -o test
*$ gdb test
GNU gdb (Ubuntu 7.12.50.20170314-0ubuntu1.1) 7.12.50.20170314-git
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test...done.
(gdb) 
```

## .hcenter[Shell session]
```shell
$ gcc -c -g test.c
$ gcc test.o -o test
*$ gdb -q test
Reading symbols from test...done.
(gdb) 
```

---

# GNU Project Debugger - Loading a binary

## .hcenter[Shell session]
```shell
$ gcc -c -g test.c
$ gcc test.o -o test
$ gdb -q
*(gdb) file test
Reading symbols from test...done.
(gdb) 
```

## .hcenter[Shell session]
```shell
*$ gcc -c test.c
$ gcc test.o -o test
$ gdb -q
(gdb) file test
*Reading symbols from test...(no debugging symbols found)...done.
(gdb) 
```

:arrow_right: `gdb` can work without debugging symbols... but can You ? 

---

# GNU Project Debugger - Launching a program

## Run with arguments
```shell
(gdb) file test
Reading symbols from test...done.
*(gdb) run -h --help --please --pleasehelpme 
Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test -h --help --please --pleasehelpme
table[0]=0
table[1]=1
table[2]=2
[Inferior 1 (process 9109) exited normally]
(gdb)
```

## Run without arguments and break at `main()`
```shell
(gdb) file test
Reading symbols from test...done.
*(gdb) start
Temporary breakpoint 1 at 0x72f: file test.c, line 8.
Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test 

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe548) at test.c:8
8         int* table = (int*)malloc(sizeof(int) * SIZE);
(gdb)
```

## Run with arguments and break at `main()`
```shell
(gdb) file test
Reading symbols from test...done.
*(gdb) set args -h --help --please --pleasehelpme
*(gdb) start
Temporary breakpoint 1 at 0x72f: file test.c, line 8.
Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test -h --help --please --pleasehelpme

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe548) at test.c:8
8         int* table = (int*)malloc(sizeof(int) * SIZE);
(gdb)
```

---

# GNU Project Debugger - Running environment

## Show arguments
```shell
(gdb) file test
Reading symbols from test...done.
(gdb) run 1 -2 --3 4
Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test 1 -2 --3 4
table[0]=0
table[1]=1
table[2]=2
[Inferior 1 (process 9109) exited normally]
*(gdb) show args
Argument list to give program being debugged when it is started is "1 -2 --3 4".
(gdb)
```

## Show and set environment variables
```shell
(gdb) file test
Reading symbols from test...done.
*(gdb) show environment
[...]
SHELL_SESSION_ID=40fbec4f88ec4e3da8d0ae5adcdb501f
COLORTERM=truecolor
USERNAME=dubrayn
[...]
*(gdb) show environment SHELL
SHELL = /bin/bash
*(gdb) set environment SHELL /bin/false
*(gdb) show environment SHELL
SHELL = /bin/false
```

---

# GNU Project Debugger - Source

## List the current source line and its surroundings
```shell
(gdb) file test
Reading symbols from test...done.
(gdb) start
Temporary breakpoint 1 at 0x72f: file test.c, line 8.
Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test 

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe548) at test.c:8
8         int* table = (int*)malloc(sizeof(int) * SIZE);
*(gdb) list
3
4       #define SIZE 3
5
6       int main(int argc, char **argv)
7       {
8         int* table = (int*)malloc(sizeof(int) * SIZE);
9
10        for (int i = 0; i < SIZE; i++)
11        {
12          table[i] = i;
```

---

# GNU Project Debugger - Machine code

## Disassemble the current function and show the current instruction
```shell
(gdb) file test
Reading symbols from test...done.
(gdb) start
Temporary breakpoint 4 at 0x55555555472f: file test.c, line 8.
Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test 

Temporary breakpoint 4, main (argc=1, argv=0x7fffffffe548) at test.c:8
8         int* table = (int*)malloc(sizeof(int) * SIZE);
*(gdb) disassemble
Dump of assembler code for function main:
   0x0000555555554720 <+0>:     push   %rbp
   0x0000555555554721 <+1>:     mov    %rsp,%rbp
   0x0000555555554724 <+4>:     sub    $0x20,%rsp
   0x0000555555554728 <+8>:     mov    %edi,-0x14(%rbp)
   0x000055555555472b <+11>:    mov    %rsi,-0x20(%rbp)
=> 0x000055555555472f <+15>:    mov    $0xc,%edi
   0x0000555555554734 <+20>:    callq  0x5555555545e0
   0x0000555555554739 <+25>:    mov    %rax,-0x8(%rbp)
   0x000055555555473d <+29>:    movl   $0x0,-0x10(%rbp)
   0x0000555555554744 <+36>:    jmp    0x555555554763 <main+67>
   0x0000555555554746 <+38>:    mov    -0x10(%rbp),%eax
   0x0000555555554749 <+41>:    cltq   
   0x000055555555474b <+43>:    lea    0x0(,%rax,4),%rdx
   0x0000555555554753 <+51>:    mov    -0x8(%rbp),%rax
   0x0000555555554757 <+55>:    add    %rax,%rdx

[...]

   0x00005555555547af <+143>:   callq  0x5555555545d0
   0x00005555555547b4 <+148>:   mov    $0x0,%eax
   0x00005555555547b9 <+153>:   leaveq 
   0x00005555555547ba <+154>:   retq   
End of assembler dump.
(gdb)
```

---

# GNU Project Debugger - Main commands

## Execution flow control
* `start`: start the program execution and break at `main()`
* `run args`: restart the program execution with arguments `args`
* `kill`: stop the program execution
* `step`: execute until next line of code, go deeper if needed (and possible)
* `stepi`: execute next machine instruction, go deeper if needed (and possible)
* `next`: execute until next line of code, do not go deeper
* `nexti`: execute next machine instruction, do not go deeper
* `finish`: execute until exiting the current function
* `continue`: continue the program execution

## Breakpoints
* `break loc`: set a **breakpoint** at location `loc`
* `clear loc`: remove a breakpoint at location `loc`
* `info breakpoints`: print the list of breakpoints and watchpoints
* `delete id`: delete the breakpoint or watchpoint `id`

---

# GNU Project Debugger - Core dump

:arrow_right: An "image" of the state of a program at a given moment in time,
that allows to perform (among other things) a **post-mortem analysis** of a crash.

## Activate core dump at segfault

```shell
$ ulimit -c
0
*$ ulimit -c unlimited
$ ulimit -c
unlimited
```

## Dump core at segfault and analyse with `gdb`

.row[
.column.w30[
## .hcenter[`segfault.c`]
```C
#include "stdlib.h"

void main(void)
{
  int *p = NULL;
  *p = 0;
}
```
]

.column.w65[
## .hcenter[Shell session]
```shell
$ gcc -c -g segfault.c
$ gcc segfault.o -o segfault
$ ./segfault
*Segmentation fault (core dumped)
*$ gdb -q segfault core.8840
Reading symbols from segfault...done.
[New LWP 9812]
Core was generated by `./segfault'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000004005c8 in main () at segfault.c:6
6         *p = 0;
(gdb)
```
]
]

---

# GNU Project Debugger - Call stack / variables

<div data-dir='scripts/toto2' class='cinescript'></div>

---

# GNU Project Debugger - Call stack / variables

:arrow_right: The **call stack** is a stack of **frames**. When following a `call` command,
a new frame is added on top of the existing ones. Each **frame** contains local variables,
function arguments and the local address pointer.  

## Manipulate frames
* `backtrace`: print the call stack
* `backtrace full`: print the full call stack (print variables, arguments...)
* `frame`: select a frame
* `up`: got one frame up in the call stack
* `down`: got one frame down in the call stack

## Print variables / expressions
* `print expr`: evaluate and print `expr`
* `print/x expr`: evaluate and print `expr` in hexadecimal format
* `print * addr`: print the value located at `addr` address
* `display expr`: evaluate and print `expr` at each break
* `undisplay nb`: cancel display number `nb`

---

# GNU Project Debugger - Watchpoints

:arrow_right: Stop the program when a variable, expression or part of memory changes.

## Commands to manipulate watchpoints
* `watch expr`: break if `expr` change, **even if it is not defined yet**
* `info watchpoints`: print the list of watchpoints
* `delete id`: delete the breakpoint or watchpoint `id`

---

# GNU Project Debugger - Program scope modification

## .hcenter[:warning: These commands will alter the "normal" execution of the program :warning:]

## Change program variables
* `set variable v = expr`: evaluate `expr` and assign its value to the variable `v`

## Call a function 
* `call func(...)`: call function `func` in the program scope

## Return from a function
* `return`: return to the calling function **now**

---

# GNU Project Debugger - Conditional breakpoint

:arrow_right: Allows to break only if a given condition is verified.

## Add a condition to a breakpoint
* `condition id cond`: add `cond` condition to the breakpoint `id`

## .hcenter[Example `gdb` session]
```text
(gdb) file test
Reading symbols from test...done.
(gdb) start
Temporary breakpoint 1 at 0x4006a7: file test.c, line 8.
Starting program: /home/dubrayn/temp/ensiie/lectures/tools/gdb/test

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffd7c8) at test.c:8
8         int* table = (int*)malloc(sizeof(int) * SIZE);
(gdb) break 12
Breakpoint 3 at 0x4006be: file test.c, line 12.
*(gdb) condition 3 i > 1
(gdb) continue
Continuing.

Breakpoint 3, main (argc=1, argv=0x7fffffffd7c8) at test.c:12
12          table[i] = i;
(gdb) print i
*$1 = 2
```

---

# GNU Project Debugger - Thread debugging

:arrow_right: Threads can be easily debugged with `gdb`.

## Threads-related commands
* `thread id`: switch to thread `id`
* `info threads`: show existing threads
* `thread apply [thread-id-list] [all] cmd`: apply `cmd` to a list of threads

## Armadillo threads ?
* the Armadillo library uses the `BLAS` library, which can spawn threads

:arrow_right: Use the environment variable `OPENBLAS_NUM_THREADS` to control the number of threads used by `OpenBLAS`.

## .hcenter[Shell session]
```shell
$ export OPENBLAS_NUM_THREADS=1 # no thread spawning
$ export OPENBLAS_NUM_THREADS=$(nproc) # full throttle !!!
```

---

# GNU Project Debugger - Conque-GDB vim addon

<div data-dir='scripts/toto' class='cinescript'></div>

---

# GNU Project Debugger - Qt-Creator

.hcenter.shadow.w100[![](images/qt-creator.png)]

---

# GNU Project Debugger - Reverse debugging

.hcenter.shadow.w40[![](images/future.png)]

:arrow_right: Record the full program state and **go back and forth in time** !

## Commands **to go back in time**
* `record`: start the recording of the program state
* `reverse-continue`: continue **backwards**
* `reverse-finish`: continue backwards until the lower frame
* `reverse-step`: go back to the previous line, go deeper if needed
* `reverse-stepi`: go back to the previous machine instruction, go deeper if needed
* `reverse-next`: go back to the previous line, do not go deeper
* `reverse-nexti`: go back to the previous machine instruction, do not go deeper
* `set exec-direction (forward/reverse)`: set direction of execution for `next`, `step`, etc...

---

# GNU Project Debugger - Reverse debugging

<div data-dir='scripts/toto3' class='cinescript'></div>

---

# GNU Project Debugger - Conclusions

## About `gdb`

:+: **best debugger ever**  
:warning: CLI interface **might be** hard to grasp  
:warning: several commands to remember  
:+: the debugging workflow is **almost the same** in every other debugger  
:+: if `gdb` is often used "below" a nice GUI, why use a GUI ?  

## About debugging
:+: no more `printf` !!!  
:+: gain some knowledge about the lower layers (stack, cache, register, etc...)  
:+: do not trust your compiler, check the generated code (and learn ASM)  
:+: "A segfault ? Yes ! Time to play with `gdb` !"  

.hcenter.w90[![](images/xkcd_371.png)]
.hcenter[[source: xkcd](https://xkcd.com/371)]

---

# Scientific programming

## Scientific programming tools presented in this course
 * linear algebra libraries (usable from `C++`)
  * `Boost`, `LAPACK`, `BLAS`, `MKL`, `GSL`, `Eigen`, **`Armadillo`** :+:
 * scientific programming with Python
  * **introduction to Python** :+:
  * **`NumPy`** .animated.infinite.flash[:arrow_left:]
 * presenting results
  * write beautiful math: ** .latex[L<sup>a</sup>T<sub>e</sub>X] ** :+:
  * structure the documentation: **Markdown** :+:
  * write some slides: **remark.js** :+:

## Scientific programming techniques presented in this course
 * numerical integration
  * **Gaussian quadrature** :+:
 * optimization of sum calculation  :+:

---

# NumPy

.hcenter.w40[![](images/numpy_logo.png)]

## Key points

* core library for scientific computing in Python
* very close from MATLAB (but open-source :v:)
* provides very efficient multi-dimensional arrays
* provides several tools to integrate C/C++ and Fortran codes
* provides functions for: linear algebra, RNG, Fourier transforms, etc...

.block[
* Repository: [https://github.com/numpy/numpy](https://github.com/numpy/numpy)
* Website: [http://www.numpy.org](http://www.numpy.org)
* License: `BSD`
]

---

# NumPy - Arrays

:arrow_right: The main object of the `numpy` package is the **homogeneous multidimensional array**.
## Array initialization
```Python
import numpy as np

a = np.array([2, 3, 4])
v = np.array([2.1, 3.4, 2.7])
```

## Type of the stored data
:arrow_right: The type is infered from the values.  

```Python
import numpy as np

np.array([2, 3, 4  ]).dtype # dtype('int64')
np.array([2, 3, 4.0]).dtype # dtype('float64')
```

## Size of each element
:arrow_right: Useful to calculate the memory footprint of an array.  

```Python
import numpy as np

np.array([2, 3, 4  ]).itemsize # 8
np.array([2, 3, 4.0]).itemsize # 8
```

---

# NumPy - Arrays

## Initialization from a list [of tuple [of list [...]]]
```Python
import numpy as np

dvec  = np.array([1.0, 2.0, 3.0])                                  #     3 double vector
dmat  = np.array([[1, 2, 3], (2, 3, 4.0)])                         #   2x3 double matrix
icube = np.array([[[1, 2, 3], [2, 3, 4]], [[3, 4, 5], [4, 5, 6]]]) # 2x2x3 integer cube
```

## Explicit type specification
```Python
import numpy as np

cmat  = np.array([[1, 2], [3, 4]], dtype = complex) # 2x2 complex double matrix
```

## Initialization by conversion
```Python
import numpy as np

dvec  = np.array([1.0, 2.0, 3.0])       # 3 double vector
cvec  = np.array(dvec, dtype = complex) # 3 complex double vector
```

## Initialization by copy
```Python
import numpy as np

ivec  = np.array([1, 2, 3])
a = ivec
a is ivec # True <- WARNING !!!
b = np.array(ivec)
b is ivec # False
c = ivec[:]
c is ivec # False
d = np.copy(ivec)
d is ivec # False
```

---

# NumPy - Arrays

## Rank (number of axes)
```Python
import numpy as np

a = np.array([1, 2, 3])
a.ndim # 1

b = np.array([[1, 2], [3, 4]])
b.ndim # 2
```

## Dimensions
```Python
import numpy as np

a = np.array([1, 2, 3])
a.shape # (3,)

b = np.array([[1, 2], [3, 4]])
b.shape # (2, 2)
```

## Number of elements
```Python
import numpy as np

a = np.array([1, 2, 3])
a.size # 3

b = np.array([[1, 2], [3, 4]])
b.size # 4
```

---

# NumPy - Array creation

.row[
 .column.w48[
## Ones and zeros
```Python
import numpy as np

print(np.eye(3, 4))
# [[ 1.  0.  0.  0.]
#  [ 0.  1.  0.  0.]
#  [ 0.  0.  1.  0.]]

print(np.identity(3))
# [[ 1.  0.  0.]
#  [ 0.  1.  0.]
#  [ 0.  0.  1.]]

print(np.zeros([3, 4]))
# [[ 0.  0.  0.  0.]
#  [ 0.  0.  0.  0.]
#  [ 0.  0.  0.  0.]]

print(np.ones([3, 4]))
# [[ 1.  1.  1.  1.]
#  [ 1.  1.  1.  1.]
#  [ 1.  1.  1.  1.]]

print(np.tri(3, dtype = int))
# [[1 0 0]
#  [1 1 0]
#  [1 1 1]]

print(np.full([3, 4], 6))
# [[6 6 6 6]
#  [6 6 6 6]
#  [6 6 6 6]]
```
]

.column.w48[
## Copy the shape of another array
```Python
import numpy as np

# uninitialized 3x4 int16 matrix:
a = np.empty([3, 4], dtype = np.int16)
# uninitialized 3x4 int16 matrix
b = np.empty_like(a)

print(np.zeros_like(a))
# [[0 0 0 0]
#  [0 0 0 0]
#  [0 0 0 0]]

print(np.ones_like(a))
# [[1 1 1 1]
#  [1 1 1 1]
#  [1 1 1 1]]

print(np.full_like(a, 7))
# [[7 7 7 7]
#  [7 7 7 7]
#  [7 7 7 7]]

print(np.full_like(a, 7.5))
# [[7 7 7 7]
#  [7 7 7 7]
#  [7 7 7 7]]

```
]
]

---

# NumPy - Arrays

.row[
 .column.w48[
## Matrix functions
```Python
import numpy as np

a = np.arange(12).reshape([3, 4])
print(a)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

print(np.diag(a))
# [ 0  5 10]

print(a.diagonal())
# [ 0  5 10]

print(a.diagonal(1))
# [ 1  6 11]

print(a.diagonal(2))
# [2  7]

print(np.tril(a))
# [[ 0  0  0  0]
#  [ 4  5  0  0]
#  [ 8  9 10  0]]

print(np.triu(a))
# [[ 0  1  2  3]
#  [ 0  5  6  7]
#  [ 0  0 10 11]]
```
]
.column.w48[
## Matrix multiplication
```Python
import numpy as np

a = np.arange(9).reshape([3, 3])
print(a)
# [[0 1 2]
#  [3 4 5]
#  [6 7 8]]

b = np.arange(-4, 5).reshape([3, 3])
print(b)
# [[-4 -3 -2]
#  [-1  0  1]
#  [ 2  3  4]]

print(a * b) # element-wise multiplication !
# [[ 0 -3 -4]
#  [-3  0  5]
#  [12 21 32]]

print(a.dot(b)) # matrix multiplication
# [[  3   6   9]
#  [ -6   6  18]
#  [-15   6  27]]

print(np.linalg.matrix_power(a, 4))
# [[ 2430  3132  3834]
#  [ 7452  9612 11772]
#  [12474 16092 19710]]
```
]]

---

# NumPy - Arrays

## Main linear algebra functions
```Python
import numpy as np

b = np.arange(-4, 5).reshape([3, 3])
print(b)
# [[-4 -3 -2]
#  [-1  0  1]
#  [ 2  3  4]]

print(b.transpose())
# [[-4 -1  2]
#  [-3  0  3]
#  [-2  1  4]]

print(np.det(b))
# 0.0

print(np.linalg.norm(b))
# 7.745966692414834

eigval, eigvec = np.linalg.eig(b)
print(eigval) 
# [ -4.24264069e+00   3.81639165e-17   4.24264069e+00]

print(eigvec)
# [[ 0.90104757  0.40824829 -0.3236973 ]
#  [ 0.28867513 -0.81649658  0.28867513]
#  [-0.3236973   0.40824829  0.90104757]]

print(np.matmul(b, eigvec[:,[0]]) - eigval[0] * eigvec[:,[0]])
# [[  4.44089210e-16]
#  [  4.44089210e-16]
#  [  6.66133815e-16]]


```


---

# NumPy - Arrays

## Change the rank and dimensions (not the data)
```Python
import numpy as np

a = np.arange(12)
print(a)
# [ 0 1 2 3 4 5 6 7 8 9 10 11]

b = a.reshape([3, 4])
print(b)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

print(b.reshape([1, 12]))
# [[ 0  1  2  3  4  5  6  7  8  9 10 11]]
```

.row[
 .column.w48[
## Access to rows
```Python
import numpy as np

a = np.arange(12).reshape([3, 4])
print(a)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

print(a[2, :])
# [ 8  9 10 11]

print(a[[2], :])
# [[ 8  9 10 11]]

a[2, :] = [9, 9, 9, 9]
print(a)
# [[0 1 2 3]
#  [4 5 6 7]
#  [9 9 9 9]]
```
]

.column.w48[
## Access to columns
```Python
import numpy as np

a = np.arange(12).reshape([3, 4])
print(a)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

print(a[:, 3])
# [ 3  7 11]

print(a[:, [3]])
# [[ 3]
#  [ 7]
#  [11]]

a[:, 3] = [9, 9, 9]
print(a)
# [[ 0  1  2  9]
#  [ 4  5  6  9]
#  [ 8  9 10  9]]

```
]
]

---

# NumPy - Arrays

## Generate a sequence of numbers

* to generate an array instead of a list
* accepts **float arguments**

```Python
import numpy as np

np.arange(8)            # array([ 0  ,  1  ,  2  ,  3  ,  4,   5 , 6 , 7 ])
np.arange(8.)           # array([ 0. ,  1. ,  2. ,  3. ,  4. , 5., 6., 7.])

np.arange(8, 13)        # array([ 8  ,  9  , 10  , 11  , 12  ])
np.arange(8, 13.2)      # array([ 8. ,  9. , 10. , 11. , 12. ])
np.arange(8.2, 13)      # array([ 8.2,  9.2, 10.2, 11.2, 12.2])

np.arange(8, 13, 2)     # array([ 8  , 10  , 12  ])
np.arange(8, 13, 2.2)   # array([ 8  , 10.2, 12.4])
np.arange(8.2, 13, 2.2) # array([ 8.2, 10.4, 12.6])
```

## Subdivise a range
```Python
import numpy as np

np.linspace(0, 1, 3) # array([ 0., 0.5, 1. ])
np.linspace(3, 7, 9) # array([ 3., 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. ])
```

---

# NumPy - Some broadcast functions

.row[
.column.w55.middle[
## `Sin` and `Cos` functions
```Python
import numpy as np
import matplotlib.pyplot as plt

t = np.linspace(0, 2 * np.pi, 200)
a = np.sin(t)
b = np.cos(t)
plt.plot(t, a, 'r')
plt.plot(t, b, 'g')
plt.show()
```
]
.column.w40.middle[
![](images/sin_cos.png)
]
]

## List of trigonometric functions

```Python
sin cos tan arcsin arccos arctan arctan2 hypot sinh cosh tanh arcsinh arccosh arctanh deg2rad rad2deg
```


---

# NumPy - Print arrays

## Long arrays

:arrow_right: When an array is too large, only the corners are printed.

```Python
import numpy as np

np.get_printoptions()['threshold'] # 1000

print(np.arange(1002).reshape([3, 334]))
# [[   0    1    2 ...,  331  332  333]
#  [ 334  335  336 ...,  665  666  667]
#  [ 668  669  670 ...,  999 1000 1001]]

np.set_printoptions(threshold = 100)

print(np.arange(100).reshape([10, 10]))
# [[ 0  1  2  3  4  5  6  7  8  9]
#  [10 11 12 13 14 15 16 17 18 19]
#  [20 21 22 23 24 25 26 27 28 29]
#  [30 31 32 33 34 35 36 37 38 39]
#  [40 41 42 43 44 45 46 47 48 49]
#  [50 51 52 53 54 55 56 57 58 59]
#  [60 61 62 63 64 65 66 67 68 69]
#  [70 71 72 73 74 75 76 77 78 79]
#  [80 81 82 83 84 85 86 87 88 89]
#  [90 91 92 93 94 95 96 97 98 99]]

print(np.arange(110).reshape([11, 10]))
# [[  0   1   2 ...,   7   8   9]
#  [ 10  11  12 ...,  17  18  19]
#  [ 20  21  22 ...,  27  28  29]
#  ..., 
#  [ 80  81  82 ...,  87  88  89]
#  [ 90  91  92 ...,  97  98  99]
#  [100 101 102 ..., 107 108 109]]
```

---

# NumPy - Random functions

.row[
.column.w55.middle[
## White noise
```Python
import numpy as np
import matplotlib.pyplot as plt

*data = np.random.rand(10000)

print(data.mean()) # mean value:  0.494221
print(data.var())  # variance  :  0.083729

plt.hist(data, bins=100)
plt.show()
```
]
.column.w40.middle[
![](images/hist_rand.png)
]
]

.row[
.column.w55.middle[
## Normal distribution
```Python
import numpy as np
import matplotlib.pyplot as plt

*data = np.random.randn(10000)

print(data.mean()) # mean value: -0.000091
print(data.var())  # variance  :  1.002125

plt.hist(data, bins=100)
plt.show()
```
]
.column.w40.middle[
![](images/hist_randn.png)
]
]

---

# NumPy - Arrays

## Load and save arrays
```Python
import numpy as np

dmat0 = np.eye(5)
np.save("filename", dmat0)
dmat1 = np.load("filename.npy")
np.linalg.norm(dmat0 - dmat1) # 0.0
```

* save an array in the binary `.npy` format
* avoid `np.fromfile()` and `x.tofile()`

## Construct an array from a function

```Python
import numpy as np

imat = np.fromfunction(lambda i, j: i + j, (3, 3), dtype = int)

print(imat)
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]]
```

---
# NumPy - Arrays

## Iterate over all elements of an array
```Python
import numpy as np

a = np.arange(6).reshape(2,3)
print(a)
# [[0 1 2]
#  [3 4 5]]

for x in np.nditer(a):                         # readonly access
  print x,
# 0 1 2 3 4 5

for x in np.nditer(a, op_flags=['readwrite']): # readwrite access
  x[...] = 2 * x
print(a)
# [[ 0  2  4]
#  [ 6  8 10]]
```

## Iterate over all elements of an array and get index values
```Python
import numpy as np

a = np.arange(6).reshape(2,3)
print(a)
# [[0 1 2]
#  [3 4 5]]

it = np.nditer(a, flags=['multi_index'])
while not it.finished:
  print "%d <%s>" % (it[0], it.multi_index),
  it.iternext()
# 0 <(0, 0)> 1 <(0, 1)> 2 <(0, 2)> 3 <(1, 0)> 4 <(1, 1)> 5 <(1, 2)>
```
---

# NumPy - Arrays

## Mesh grid
:arrow_right: Use this to fill arrays with index-dependent expressions.
```Python
import numpy as np

print(np.mgrid[0:3, 0:4][0])
# [[0 0 0 0]
#  [1 1 1 1]
#  [2 2 2 2]]

print(np.mgrid[0:3, 0:4][1])
# [[0 1 2 3]
#  [0 1 2 3]
#  [0 1 2 3]]
```

## Example
`$$
\forall i \in [0..2],
\forall j \in [0..3],
A_{ij} \equiv i + 0.1 j
$$`

```Python
import numpy as np

i = np.mgrid[0:3, 0:4][0]
j = np.mgrid[0:3, 0:4][1]

A = i + j * 0.1

print(A)
# [[ 0.   0.1  0.2  0.3]
#  [ 1.   1.1  1.2  1.3]
#  [ 2.   2.1  2.2  2.3]]
```
---

# NumPy - Efficient code

## Example
We want to calculate the **Frobenius norm** of a matrix product:

`$$
N \equiv \left|A.B\right|_{\textrm{fro}}
$$`

with

`$$
\forall i \in [0..\,n_0 - 1],
\forall j \in [0..\,n_1 - 1],
A_{ij} \equiv \frac{i + \frac{j}{10}}{10}, 
B_{ji} \equiv \frac{i + \frac{j}{10}}{10}.
$$`

---

class: top

# NumPy - Efficient code

## Naive implementation
```Python
import numpy as np
import time

n0 = 2000
n1 = 2000

print("===== Numpy 1a =====")
time0 = time.time()
A = np.empty((n0, n1))
B = np.empty((n1, n0))
for i in range(n0):
  for j in range(n1):
    A[i, j] = (i + j * 0.1) * 0.1
    B[j, i] = (i + j * 0.1) * 0.1

time1 = time.time()
N = np.linalg.norm(A.dot(B), ord = 'fro')
time1 = time.time()

print("fill. time: %fs" % (time1 - time0))
print("calc. time: %fs" % (time2 - time1))
print("total time: %fs" % (time2 - time0))
print("result    : %20.14e" % (N))
```

--

## Result
```text
===== Numpy 1a =====
*fill. time: 2.277015s
*calc. time: 0.426645s
total time: 2.703660s
result    : 6.17895105956704e+10
```

---

class: top

# NumPy - Efficient code

.row[
.column.w48[
## Naive implementation
```Python
import numpy as np
import time

n0 = 2000
n1 = 2000

print("===== Numpy 1a =====")
time0 = time.time()
A = np.empty((n0, n1))
*B = np.empty((n1, n0))
for i in range(n0):
  for j in range(n1):
    A[i, j] = (i + j * 0.1) * 0.1
*   B[j, i] = (i + j * 0.1) * 0.1

time1 = time.time()
N = np.linalg.norm(A.dot(B), ord = 'fro')
time2 = time.time()

print("fill. time: %fs" % (time1 - time0))
print("calc. time: %fs" % (time2 - time1))
print("total time: %fs" % (time2 - time0))
print("result    : %20.14e" % (N))
```
]

.column.w48[
## Using transposition
```Python
import numpy as np
import time

n0 = 2000
n1 = 2000

print("===== Numpy 1b =====")
time0 = time.time()
A = np.empty((n0, n1))

for i in range(n0):
  for j in range(n1):
    A[i, j] = (i + j * 0.1) * 0.1
*B = A.transpose()

time1 = time.time()
N = np.linalg.norm(A.dot(B), ord = 'fro')
time2 = time.time()

print("fill. time: %fs" % (time1 - time0))
print("calc. time: %fs" % (time2 - time1))
print("total time: %fs" % (time2 - time0))
print("result    : %20.14e" % (N))
```
]]

--

.row[
.column.w48[
## Result
```text
===== Numpy 1a =====
*fill. time: 2.277015s
*calc. time: 0.426645s
total time: 2.703660s
result    : 6.17895105956704e+10
```
]
.column.w48[
## Result
```text
===== Numpy 1b =====
*fill. time: 1.207309s
*calc. time: 0.199436s
total time: 1.406745s
result    : 6.17895105956704e+10
```
]]

---

class: top

# NumPy - Efficient code

.row[
.column.w48[
## Using transposition
```Python
import numpy as np
import time

n0 = 2000
n1 = 2000

print("===== Numpy 1b =====")
time0 = time.time()
A = np.empty((n0, n1))

*for i in range(n0):
* for j in range(n1):
*   A[i, j] = (i + j * 0.1) * 0.1


B = A.transpose()

time1 = time.time()
N = np.linalg.norm(A.dot(B), ord = 'fro')
time2 = time.time()

print("fill. time: %fs" % (time1 - time0))
print("calc. time: %fs" % (time2 - time1))
print("total time: %fs" % (time2 - time0))
print("result    : %20.14e" % (N))
```
]
.column.w48[
## Using iterator
```Python
import numpy as np
import time

n0 = 2000
n1 = 2000

print("===== Numpy 2 =====")
time0 = time.time()
A = np.empty((n0, n1))
*it = np.nditer(A, flags = ['multi_index'],
*              op_flags = ['writeonly'])
*while not it.finished:
* it[0] = (it.multi_index[0]
*       + it.multi_index[1] * 0.1) * 0.1
* it.iternext()
B = A.transpose()  

time1 = time.time()
N = np.linalg.norm(A.dot(B), ord = 'fro')
time2 = time.time()

print("fill. time: %fs" % (time1 - time0))
print("calc. time: %fs" % (time2 - time1))
print("total time: %fs" % (time2 - time0))
print("result    : %20.14e" % (N))
```
]]

--

.row[
.column.w48[
## Result
```text
===== Numpy 1b =====
*fill. time: 1.207309s
calc. time: 0.199436s
total time: 1.406745s
result    : 6.17895105956704e+10
```
]

.column.w48[
## Result
```text
===== Numpy 2 =====
*fill. time: 2.498673s
calc. time: 0.199323s
total time: 2.697996s
result    : 6.17895105956704e+10
```
]]

---

class: top

# NumPy - Efficient code

.row[
.column.w48[
## Using transposition
```Python
import numpy as np
import time

n0 = 2000
n1 = 2000

print("===== Numpy 1b =====")
time0 = time.time()
*A = np.empty((n0, n1))
*for i in range(n0):
*  for j in range(n1):
*    A[i, j] = (i + j * 0.1) * 0.1
B = A.transpose()

time1 = time.time()
N = np.linalg.norm(A.dot(B), ord = 'fro')
time2 = time.time()

print("fill. time: %fs" % (time1 - time0))
print("calc. time: %fs" % (time2 - time1))
print("total time: %fs" % (time2 - time0))
print("result    : %20.14e" % (N))
```
]

.column.w48[
## Using `np.mgrid`
```Python
import numpy as np
import time

n0 = 2000
n1 = 2000

print("===== Numpy 3 =====")
time0 = time.time()

*im, jm = np.mgrid[0:n0, 0:n1]
*A = (im + jm * 0.1) * 0.1

B = A.transpose()

time1 = time.time()
N = np.linalg.norm(A.dot(B), ord = 'fro')
time2 = time.time()

print("fill. time: %fs" % (time1 - time0))
print("calc. time: %fs" % (time2 - time1))
print("total time: %fs" % (time2 - time0))
print("result    : %20.14e" % (N))

```
]
]

--

.row[
.column.w48[
## Result
```text
===== Numpy 1b =====
*fill. time: 1.207309s
calc. time: 0.199436s
total time: 1.406745s
result    : 6.17895105956704e+10
```
]

.column.w48[
## Result
```text
===== Numpy 3 =====
*fill. time: 0.078148s
calc. time: 0.198162s
total time: 0.276310s
result    : 6.17895105956704e+10
```
]
]

---

# NumPy - Python accelerators

:arrow_right: Some tools allow to "compile" Python / NumPy code :
* **Numba**:  
 * JIT compilation of Python code  
 * website: [https://numba.pydata.org](https://numba.pydata.org)
* **Cython**:  
 * write and compile modules using a superset of the Python language and use them from standard Python
 * website: [http://cython.org](http://cython.org)
* **PyPy**:
 * highly compatible alternative implementation of the Python language using JIT compilation
 * :warning: does not support NumPy :warning:  
 * website: [https://pypy.org](https://pypy.org)
* **Pythran**:  
 * generate and compile `C++` code from annotated Python code (supports only a subset of the Python language)  
 * website: [http://pythran.readthedocs.io/en/latest](http://pythran.readthedocs.io/en/latest)

---

# NumPy - Numba

## Usage
* add `@jit` decorator before the functions to be JIT compiled

.row[
.column.w48[

## Example

```Python
import numpy as np
import time
from numba import jit

print("===== Numba =====")
time0 = time.time()
n0 = 2000
n1 = 2000
@jit
def fill(n0, n1):
  a = np.empty((n0, n1))
  for i in range(n0):
    for j in range(n1):
      a[i, j] = (i + j * 0.1) * 0.1
  b = a.transpose()
  return a, b

@jit
def calc(a, b):
  return np.linalg.norm(a.dot(b), ord = 'fro')

time1 = time.time()
A, B = fill(n0, n1) 
time2 = time.time()
N = calc(A, B)
time3 = time.time()
print("comp. time: %fs" % (time1 - time0))
print("fill. time: %fs" % (time2 - time1))
print("calc. time: %fs" % (time3 - time2))
print("TOTAL time: %fs" % (time3 - time0))
print("result    : %20.14e" % (N))


```
]
.column.w48.middle[

## Result
```text
===== Numba ===== 
comp. time: 0.023121s
fill. time: 0.339344s
calc. time: 0.277222s
TOTAL time: 0.639687s
result    : 6.17895105956704e+10
```
]
]

---

# NumPy - Cython

## Module compilation
* write a `calc.pyx` file
* write a `setup_cython.py` file
* compile the module with

```shell
$ python setup_cython.py build_ext --inplace
```

.row[
.column.w55[

## .hcenter[`[calc.pyx]`]
```Python
import numpy as np
cimport numpy as np
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t

cimport cython
# turn off bounds-checking for entire function
@cython.boundscheck(False)
# turn off negative index wrapping for entire function
@cython.wraparound(False)
def fill(int n0, int n1):
  cdef int i
  cdef int j
  cdef np.ndarray A = np.empty((n0, n1), dtype = DTYPE)
  cdef DTYPE_t value
  for i in range(n0):
    for j in range(n1):
      value = (i + j * 0.1) * 0.1
      A[i, j] = value
  B = A.transpose()
  return A, B

def calc(np.ndarray A, np.ndarray B):
  return np.linalg.norm(A.dot(B), ord = 'fro')
```
]

.column.w40.middle[
## .hcenter[`[setup_cython.py]`]
```Python
from distutils.core import setup
from Cython.Build import cythonize

setup(
  ext_modules=cythonize("calc.pyx"),
)
```
]
]

---

class: top

# NumPy - Cython

.row[
.column.w48[
## Using `np.mgrid`
```Python
import numpy as np
import time

n0 = 2000
n1 = 2000

print("===== Numpy 3 =====")
time0 = time.time()

*im, jm = np.mgrid[0:n0, 0:n1]
*A = (im + jm * 0.1) * 0.1
*B = A.transpose()

time1 = time.time()

*N = np.linalg.norm(A.dot(B), ord = 'fro')

time2 = time.time()

print("fill. time: %fs" % (time1 - time0))
print("calc. time: %fs" % (time2 - time1))
print("total time: %fs" % (time2 - time0))
print("result    : %20.14e" % (N))

```
]

.column.w48[
## Using Cython
```Python
import numpy as np
import time

n0 = 2000
n1 = 2000

print("===== Cython =====")
time0 = time.time()

*import calc
time1 = time.time()

*A, B = calc.fill(n0, n1)
time2 = time.time()

*N = calc.calc(A, B)
time3 = time.time()

print("init. time: %fs" % (time1 - time0))
print("fill. time: %fs" % (time2 - time1))
print("calc. time: %fs" % (time3 - time2))
print("TOTAL time: %fs" % (time3 - time0))
print("result    : %20.14e" % (N))
```
]
]

--

.row[
.column.w48[
## Result
```text
===== Numpy 3 =====
*fill. time: 0.078148s
*calc. time: 0.198162s
total time: 0.276310s
result    : 6.17895105956704e+10
```
]

.column.w48[
## Result
```text
===== Cython =====
init. time: 0.000356s
*fill. time: 0.789828s
*calc. time: 0.429509s
TOTAL time: 1.219693s
result    : 6.17895105956704e+10
```
]
]


---

# NumPy - Pythran

## Module compilation
* write a `matrix_multiply.py` file
* use Pythran special decorators and directives
* compile the module with

```shell
$ pythran matrix_multiply.py
```

## .hcenter[`[matrix_multiply.py]`]
```Python
#pythran export matrix_multiply(float list list, float list list)
def zero(n, m):
  return [[0 for row in xrange(n)] for col in xrange(m)]

def matrix_multiply(m0, m1):
  new_matrix = zero(len(m0), len(m1[0]))
  for i in xrange(len(m0)):
    for j in xrange(len(m1[0])):
      r = 0
      "omp parallel for reduction(+:r)"
      for k in xrange(len(m0)):
        r += m0[i][k] * m1[k][j]
      new_matrix[i][j] = r
  return new_matrix
```

---

class: top

# NumPy - Pythran

.row[
.column.w48[
## Using `np.mgrid`
```Python
import numpy as np
import time

n0 = 2000
n1 = 2000

print("===== Numpy 3 =====")
time0 = time.time()

*im, jm = np.mgrid[0:n0, 0:n1]
*A = (im + jm * 0.1) * 0.1
*B = A.transpose()

time1 = time.time()

*N = np.linalg.norm(A.dot(B), ord = 'fro')

time2 = time.time()

print("fill. time: %fs" % (time1 - time0))
print("calc. time: %fs" % (time2 - time1))
print("total time: %fs" % (time2 - time0))
print("result    : %20.14e" % (N))

```
]

.column.w48[
## Using Pythran
```Python
import numpy as np
import time

n0 = 2000
n1 = 2000

print("===== Pythran =====")
time0 = time.time()
*import matrix_multiply
time1 = time.time()
*A = [[((i + j * 0.1) * 0.1)
*     for i in xrange(n0)] for j in xrange(n1)]
*B = A.transpose()
time2 = time.time()
*mat = matrix_multiply.matrix_multiply(A, B)
time3 = time.time()
*N = np.linalg.norm(mat, ord = 'fro')
time4 = time.time()

print("init. time: %fs" % (time1 - time0))
print("fill. time: %fs" % (time2 - time1))
print("mult. time: %fs" % (time3 - time2))
print("norm. time: %fs" % (time4 - time3))
print("TOTAL time: %fs" % (time4 - time0))
print("result    : %e" % (N))
```
]
]

--

.row[
.column.w48[
## Result
```text
===== Numpy 3 =====
fill. time: 0.078148s
*calc. time: 0.198162s
total time: 0.276310s
result    : 6.17895105956704e+10
```
]

.column.w48[
## Result
```text
===== Pythran =====
init. time: 0.000120s
fill. time: 1.526504s
*mult. time: 113.829122s
norm. time: 0.094946s
TOTAL time: 115.450692s
result    : 6.17895105956707e+10
```
]
]

---

# NumPy - Conclusions

.noflex[

.rightf.w50[
.hcenter[![](images/xkcd_353.png)]
.hcenter[[source: xkcd](https://xkcd.com/353)]
]

.w40[![](images/numpy_logo.png)]

## Standard NumPy

:+: very efficient  
:+: very easy to read  
:+: **LOTS** of specialized functions  
:+: can be accelerated **if needed**  
:warning: loops in python / NumPy are **SLOW**

## To go further
:arrow_right: very dynamic field of research  
:arrow_right: **stay curious**  
:arrow_right: **try new frameworks**  
:arrow_right: Python bindings of Armadillo + NumPy  
:+: Swig + Armadillo + NumPy  
]

---

# Scientific programming

## Scientific programming tools presented in this course
 * linear algebra libraries (usable from `C++`)
  * `Boost`, `LAPACK`, `BLAS`, `MKL`, `GSL`, `Eigen`, **`Armadillo`** :+:
 * scientific programming with Python
  * **introduction to Python** :+:
  * **`NumPy`** :+:
 * presenting results
  * write beautiful math: ** .latex[L<sup>a</sup>T<sub>e</sub>X] ** :+:
  * structure the documentation: **Markdown** :+:
  * write some slides: **remark.js** :+:

## Scientific programming techniques presented in this course
 * numerical integration
  * **Gaussian quadrature** :+:
 * optimization of sum calculation  :+: