






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
These are the Lecture Slides of Program Optimization for Multi Core Architectures which includes Triangular Lower Limits, Multiple Loop Limits, Dependence System Solvers, Single Equation, Simple Test, Extreme Value Test etc.Key important points are: Clause and Routines, Lastprivate, Reduction, Loop Work-Sharing, Schedule Clause, Run-Time Library Routines, Environment Variables
Typology: Slides
1 / 11
This page cannot be seen from the preview
Don't miss anything!
file:///D|/...ry,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture%2021/21_1a.htm[6/14/2012 12:04:33 PM]
void useless() { int tmp = 0; #pragma omp parallel for firstprivate(tmp)lastprivate(tmp) for (int j = 0; j ¡ 1000; ++j) tmp += j; printf(“%d”, tmp); }
Each thread gets its own code with initial value 0.Is there something still wrong with the code? Tmp is defined as its value at the “last sequential” iteration (i .e., for j=999)
Purpose: The shared clause declares variables in its list to be shared among all the threads in the team Format: Shared (list)
A shared variable exists in only one memory and all the threads can read or write the same address Its programmers responsibility to ensure that multiple threads properly access shared variables (such as critical sections)
Purpose: The default clause allows user to specify a default scope for all variables in the parallel region Format: Default(shared | none)
Using NONE as a default requires that the programmer explicitly scope all the variables The C/C++ OpenMP does not include private or firstprivate as a possible default Only the Fortran API supports default(private) Note that the default storage attribute is DEFAULT(SHARED) (so no need to use it)
double ave = 0.0, A [ MAX ]; int i; #pragma omp parallel for reduction (+ : ave) for (i = 0; i < MAX; i++) { ave + = A[i]; } ave = ave/MAX;
Purpose: The copyin clause provides a means for assigning the same value to threadprivate variables for all threads in the team. Format: copyin (list)
The master thread variable is used as the copy source. The team threads are initialized with its value upon entry into the parallel construct.
schedule ( static | dynamic | guided [, chunk ] ) schedule (runtime)
The schedule clause affect how loop iterations are mapped into threads
Loop iterations are divided into pieces of size chunk and then statically assigned to threads In absence of chunk size iterations are evenly (if possible) divided contiguously among the threads. Pre-determined and predictable by the programmer Least work at runtime: scheduling done at compile-time
Fixed portions of work; size is controlled by the value of chunk When a thread finishes one chunk, It is dynamically assigned another The default chunk size is 1. Most work at runtime: complex scheduling logic used at run-time
Special case of dynamic to reduce scheduling overhead The size of the block starts large and shrinks down to size chunk as the calculation proceeds Default chunk size is 1
Iteration scheduling scheme is set at runtime through environment variable OMP SCHEDULE or the runtime library
Given loop of length 16 with 4 threads: How the iterations will be assigned in static schedule with no chunk and chunk=2? What will be the change in case of dynamic scheduling?
OMP SET NUM THREADS: Sets the number of threads that will be used in the next parallel region. Must be a positive number.
void omp set num threads(int num threads)
This routine can only be called from the serial portion of the code. This call has precedence over the OMP NUM THREADS
OMP IN PARALLEL: May be called to determine if the section of code which is executing is parallel or not.
int omp in parallel(void)
For FORTRAN, this function returns TRUE if is called from the dynamic extent of a region executing in parallel, and FALSE otherwise. For C/C++, it will return a non-zero integer if parallel and zero otherwise
OMP SET DYNAMIC: Enables or Disables dynamic adjustment( by the run time system) of the number of threads available for the execution of parallel regions.
int omp set dynamic(int dynamic threads)
Must be called from serial section of the program If dynamic threads evaluated to non-zero, then the mechanism is enabled, otherwise it is disabled
OMP GET DYNAMIC: Used to determine thread adjustment is enabled or not
int omp get dynamic(void)
For C/C++, non-zero will be returned if dynamic thread adjustment is enabled, and zero otherwise For FORTRAN, this function returns TRUE if dynamic thread adjustment is enabled and FALSE otherwise
OMP SET NESTED: Used to enable or disable nested parallelism.
int omp set nested(int nested)
The default is for nested parallelism to be disabled For C/C++, if nested evaluates to non-zero, nested parallelism is enabled;otherwise is disabled
OMP GET NESTED: Used to determine if nested parallelism is enabled or not.
int omp get nested(void)
The default is for nested parallelism to be disabled For C/C++, if non-zero value returned then nested parallelism is enabled;otherwise is disabled
OMP INIT LOCK: This subroutine initializes a lock associated with the lock variable.
void omp init lock(omp lock t *lock) void omp init nest lock(omp nest nest lock t *lock)
The initial state is unlocked
OMP DESTROY LOCK: This subroutine disassociates the given lock variable from any locks.
void omp destroy lock(omp lock t *lock) void omp destroy nest lock(omp nest nest lock t *lock)
It is illegal to call this routine with a lock variable that is not initialized
OMP GET WTIME: Provides a portable wall clock timing routine and returns a double precision floating point value equal to the number of elapsed seconds since some point in the past. Usually used in ”pairs” with the value of the first call substracted from the value of the second call to obtain the elapsed time for ablock of code (per thread times)
double omp get wtime(void)
OMP GET WTICK: Returns a double precision floating point value equal to the number of seconds between successive clock ticks.
double omp get wtick(void)
OpenMP provides the following environment variables for controlling the execution of parallel code All environment variable names are uppercase. The values assigned to them are not case sensitive
setenv OMP_SCHEDULE “guided, 4” setenv OMP_NUM THREADS 8
file:///D|/...ry,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture%2021/21_10.htm[6/14/2012 12:04:35 PM]
OpenMP Home (link) OpenMP Book Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost and Ruud van der Pas (link) OpenMP Tutorial by Blaise Barney, Lawrence Livermore National Laboratory (link) An Overview of OpenMP - Ruud van der Pas - Sun Microsystems (link) Hands-On Introduction to OpenMP, Mattson and Meadows, from SC08 (Austin) (link) OpenMP wikipedia page More resources can be found at (link)