Commit 7e7bc69f authored by Nitin Shukla's avatar Nitin Shukla
Browse files

Parallel part 2

parent 1f0d3d42
%% Cell type:markdown id: tags:
## 2. Multi-threading
###### Julia uses a task-based model, where we have a fixed number of threads and schedule defined peices of work (task's) onto them
1. Every process has multiple threads which share a single heap
2. when multiple threads are executed simultaniously we have multithreaded parallelism.
#### It has some cost:
1. Synchronisation issue: when multiple threads are accessed by the same variable.
2. All threads must live in the same physical machine
### What will NOT be covered
MPI See the example from M. Creel “https://github.com/mcreel/JuliaMPIMonteCarlo.jl”
%% Cell type:code id: tags:
``` julia
using BenchmarkTools, Test
using Base.Threads:@threads, @spawn
```
%% Cell type:markdown id: tags:
### please perform the test using terminal
###### Launching threads
1. julia -t 16
2. export JULIA_NUM_THREADS
3. Julia code can be run using a slurm batch script by including the “julia” command with the file name of the code
%% Cell type:code id: tags:
``` julia
# It shows you the hyper-threading cores, since the operating system will count hyper-threading cores independently
versioninfo(verbose=true)
```
%% Cell type:code id: tags:
``` julia
Threads.nthreads()
```
%%%% Output: execute_result
1
%% Cell type:code id: tags:
``` julia
Threads.threadid()
```
%%%% Output: execute_result
1
%% Cell type:code id: tags:
``` julia
# The number of real threads that Julia can run is fixed at startup
julia -t 16
export JULIA_NUM_THREADS = 16
```
%% Cell type:code id: tags:
``` julia
# squential
for i = 1:8
println("iteration $i Hello Julia from thread ", Threads.threadid())
end
```
%% Cell type:code id: tags:
``` julia
;env JULIA_NUM_THREADS=4 julia -E 'using .Threads; nthreads()'
```
%% Cell type:code id: tags:
``` julia
# threaded
@threads for i = 1:8
println("iteration $i Hello Julia from thread ", Threads.threadid())
end
```
%% Cell type:code id: tags:
``` julia
a = zeros(16);
```
%% Cell type:code id: tags:
``` julia
for i = 1:Threads.nthreads()
a[i] = Threads.threadid()
end
```
%% Cell type:code id: tags:
``` julia
# no collision: each thread writes its own part
@threads for i = 1:Threads.nthreads()
a[i] = Threads.threadid()
end
a
```
%%%% Output: execute_result
10-element Vector{Float64}:
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
%% Cell type:code id: tags:
``` julia
## @spawn
# It will launch a task to evaluate that experssion on some available thread
@spawn println("Hello Julia from thread ", Threads.threadid())
```
%%%% Output: execute_result
Task (done) @0x0000200010bc8340
%% Cell type:code id: tags:
``` julia
@sync begin
@spawn println("Hello Julia from thread ", Threads.threadid())
@spawn println("Hello Julia from thread ", Threads.threadid())
@spawn println("Hello Julia from thread ", Threads.threadid())
end
```
%%%% Output: execute_result
Task (done) @0x0000200010bc8010
%% Cell type:markdown id: tags:
## Static Scheduling
%% Cell type:markdown id: tags:
@threads macro : for static scheduling of for-loops
1. If we can easily compute the exact workload of subtasks before execution,
2. We can use static scheduling, which divides the workloads evenly for each available thread
%% Cell type:code id: tags:
``` julia
using .Threads
```
%% Cell type:code id: tags:
``` julia
a = rand(10000) # Create array of random numbers
p = zeros(nthreads()) # Allocate a partial sum for each thread
# Threads macro splits the iterations of array `a` evenly among threads
@threads for x in a
p[threadid()] += x # Compute partial sums for each thread
end
s = sum(p) # Compute the total sum
```
%%%% Output: execute_result
5004.699671085471
%% Cell type:code id: tags:
``` julia
println(round.(p/s, digits=3))
```
%% Cell type:markdown id: tags:
### Dynamic Scheduling
1. We cannot always compute the exact workload of subtasks before execution
2. In this case, we want to create many subtasks and assign them to threads using dynamic scheduling.
%% Cell type:code id: tags:
``` julia
using Base.Threads: @spawn
```
%% Cell type:markdown id: tags:
Next, we spawn n tasks to that are dynamically scheduled to execute on available threads and we accumulate the total workloads per thread.
%% Cell type:code id: tags:
``` julia
function task()
x = rand(0.001:0.001:0.05) # Generate a variable workload
return x # Return the workload
end
```
%%%% Output: execute_result
task (generic function with 1 method)
%% Cell type:code id: tags:
``` julia
n = 1000 # Number of tasks
p = zeros(Threads.nthreads()) # Total workload per thread
@sync for i in 1:n
@spawn p[Threads.threadid()] += task() # Spawn tasks and sum the workload
end
```
%%%% Output: error
LoadError: UndefVarError: @spawn not defined
in expression starting at In[1]:4
Stacktrace:
[1] top-level scope
@ :0
[2] eval
@ ./boot.jl:360 [inlined]
[3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1094
%% Cell type:markdown id: tags:
We can compute how well the workloads are balanced by computing the ratio of workload per thread to the total workload.
%% Cell type:code id: tags:
``` julia
s = sum(p) # Total workload
```
%%%% Output: execute_result
49980.771127760534
%% Cell type:code id: tags:
``` julia
println(round.(p/s, digits=3))
```
%% Cell type:markdown id: tags:
### Let's look at our problem
%% Cell type:code id: tags:
``` julia
len = Int64(1e9);
a, b = fill(1.0f0, len), fill(2.0f0, len);
```
%% Cell type:code id: tags:
``` julia
function multi_vectors_serial(a::Array, b::Array)
c = similar(a)
for i in eachindex(a, b)
@inbounds c[i] = a[i] * b[i]
end
return c
end
serial_t = @belapsed multi_vectors_serial(a, b);
```
%% Cell type:markdown id: tags:
## Multhreading
%% Cell type:code id: tags:
``` julia
function multi_vectors_serial_thread!(a::Array, b::Array)
c = similar(a)
Threads.@threads for i in eachindex(a, b)
@inbounds c[i] = a[i] * b[i]
end
return c
end
thread_t = @belapsed multi_vectors_serial_thread!(a, b);
```
%% Cell type:code id: tags:
``` julia
c
```
%%%% Output: execute_result
Test Passed
%% Cell type:markdown id: tags:
###### - Speedup: What's the gain ?
%% Cell type:code id: tags:
``` julia
times = [serial_t, thread_t]
speedup = maximum(times) ./ times
```
%%%% Output: execute_result
2-element Vector{Float64}:
1.0039923026004114
1.0
%% Cell type:markdown id: tags:
### Exercise-1:
1. to estimate pi with n samples.
2. ```julia using BenchmarkTools, Test, InteractiveUtils```
```julia
function estimatepi(n)
area_circle = 0
@inbounds for i = 1:n
x, y = rand(), rand();
r = x^2 + y^2
if r < 1.0
area_circle +=1
end
end
return 4* area_circle/n
end
timestart = time()
estpi = estimatepi(10_000_000_000)
elapsed = time() - timestart
println("The estimate for Pi : $estpi")
println("The elapsed time : $elapsed seconds")
```
### Exercise-2 :
```julia
function sum_vectors_serial!(x, y, z)
n = length(x)
@inbounds for i = 1:n
x[i] = y[i] + z[i]
end
end
a = zeros(Float32, 1_000_000);
b = rand(Float32, 1_000_000);
c = rand(Float32, 1_000_000)
@btime sum_vectors!($a, $b, $c)
```
### Exercise-3 :
Compute the sum of a vector
```julia
function threaded_sum1(A)
r = zero(eltype(A))
for i in eachindex(A)
@inbounds r += A[i]
end
return r
end
A = rand(10_000_000)
threaded_sum1(A)
@time threaded_sum1(A)
```
### Exercise-4 :
Let's use BLAS CALLS (Like matrix multiplcation)
```julia
using BenchmarkTools
A = rand(2000, 2000);
B = rand(2000, 2000);
@btime $A*$B;
using LinearAlgebra
BLAS.set_num_threads(1)
@btime $A*$B
BLAS.set_num_threads(4)
@btime $A*$B
```
- Parallelize the code using ```julia Base.threads ```.
%% Cell type:markdown id: tags:
-----------------------
-----------------------
# See you after break
-----------------------
-----------------------
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment