High-Performance Scientifc Computing
What | Value |
---|---|
Class Time/Location | Wednesday 5:10-7pm, Room 101 Warren Weaver Hall (NYU) |
Instructor | Andreas Kloeckner, Marsha Berger |
kloeckner@cims.nyu.edu, berger@cims.nyu.edu | |
Office | Courant Institute, Warren Weaver Hall, Rooms 1105A, 1121 |
Office Hours | Andreas: Wednesdays, 2-4pm WWH 1105A |
Class Webpage | http://bit.ly/hpc12 |
Email Listserv | Info page |
Lecture Material
Lecture # | Date | Topics | Slides | Video | Code | Extra Info |
---|---|---|---|---|---|---|
1 | Sep 5 | Intro, why parallel, Vector-add in seq, OpenMP, CL | Slides | Video | Code | |
2 | Sep 12 | Vector-add in MPI, Intro to OpenMP | Slides | Video | Code | |
3 | Sep 19 | HW2, OMP subtleties | Slides | Video | ||
4 | Sep 26 | Make, Intro to OpenCL | Slides | Video | Code | |
5 | Oct 3 | Git, OpenCL sync/local, Intro to MPI | Slides | Video | Code | |
6 | Oct 10 | Gdb, MPI point-to-point | Slides | Video | Code | |
7 | Oct 17 | Valgrind, MPI collectives, Intro perf. | Slides | Video | Code | |
8 | Oct 24 | Software installation, tmux, single-thread perf. | Slides | Video | Code | |
Oct 31 | NYU closed, no class because of Hurricane Sandy aftermath | |||||
9 | Nov 7 | Shell scripting, single/multi-thread perf. | Slides | Video | Code | |
10 | Nov 14 | Profilers, parallel perf. | Slides | Video | Code | |
11 | Nov 21 | Advanced git, GPU perf. | Slides | Video | Code | |
12 | Nov 28 | GPU perf., patterns | Slides | Video | Code | GPU mem access patterns |
12 | Dec 5 | Parallel patterns, 3D vis. | Slides | Video | Code | |
Dec 12 | No class, NYU legislative day. Runs on a Monday schedule. | |||||
13 | Dec 18 | Project Presentations (part 1) | /Projects | Video | /Projects | |
14 | Dec 19 | Project Presentations (part 2) | Video |
You'll need an up-to-date version of Google Chrome to play the videos. You'll also need decent internet bandwidth to do streaming (2 MBit/s should be sufficient). If your internet accesss is too slow, you can always right click and download the video.
Updates
Apr 26, 2013 : I just gave a talk on my HPC-related work in the Graduate Student and Postdoc Seminar at Courant. I recorded slides and audio. (should play in Firefox and Chrome)
Dec 13, 2012 : I've posted the /ProjectPresentationsSchedule, now in its final form. See you next week!
Sep 9, 2012 : We're moving to a bigger room! We'll be meeting in room 101 of Warren Weaver Hall from September 12 onward.
Aug 21, 2012 : If you're from outside of Courant, you may encounter some difficulty registering for the class. We're fighting with the NYU administration to make this better. In the meantime, please get in touch with us. This class is most definitely open to students from other departments, NYU Albert apparently just hasn't gotten the memo.
Aug 9, 2012 : Less than a month to go! Class starts on September 5, 2012, from 5-7pm. We've also been assigned a room. We will be meeting in Warren Weaver Hall, room 512 (but check back here just in case there are changes in the meantime). See you then!
Grading/Evaluation
If you will be taking the class for credit, there will be
- Weekly homework (60% of your grade)
- A more ambitious final project, which may be inspired by your own research needs (40% of your grade) (also see /ProjectSubmissionGuidelines) If you're planning on auditing or just sitting in, you are more than welcome.
Homework
- Homework 1 due September 12
- Homework 2 due
September 19September 23 - Homework 3 due October 3 (updated 9/30 for sign bug in formula)
- Homework 4 due October 10
- Homework 5 due October 17
- Homework 6 due
November 1November 4 because of the stormNovember 7 because of protracted power outage - Final project presentations in class around Dec 17--19
Material
Books
- Parallel Programming: for Multicore and Cluster Systems (available for in PDF form free online from within the NYU network, also from off-campus via this EZProxy link) For OpenCL and GPU programming, we will also be referring to the following sources:
Book | Where |
---|---|
OpenCL in Action: How to Accelerate Graphics and Computation | from NYU net |
OpenCL Programming Guide | from NYU net |
Heterogeneous Computing with OpenCL | from NYU net |
Update 9/12: Fixed ezproxy links.
Primary source material
These are the technical standards on which this class will be based. While sometimes a bit technical, these documents define whether the programs you write are correct (or not) or, perhaps result in undefined behavior:
- C99 specification
- OpenMP 3.1 specification (tutorial)
- MPI 3.0 specification (tutorial /!\ not up-to-date, teaches functions removed from MPI 3)
- OpenCL 1.2 specification
Secondary Sources
- Is Parallel Programming Hard, And, If So, What Can You Do About It?, edited by Paul Mc``Kenney
- Parallel Programming Lecture slides (and book) by Karypis et al. More theory-heavy (less practical? :) ) than this class
- CS267 Spring '12 Lecture by Jim Demmel et al, Berkeley
- DGEMM optimization
- Parallel Computer Architecture and Programming
- Learn C the hard way by Zed Shaw
- Memory Cache Optimizations from the libtorrent authors
- Introduction to High-Performance Scientific Computing by Victor Eijkhout
- Parallel Computing for Science and Engineering by Victor Eijkhout
Collected Wisdom
Virtual Machine Images
This information has moved to ComputeVirtualMachineImages.
Installing MPE into the virtual machine
MPE and Jumpshot for visualization of MPI execution were demonstrated during lecture 7. If you'd like to have those in your own virtual machine, download the following script:
Change into the directory where the script resides and start the installation:
sudo bash install-mpe.sh
If the script says
*** SUCCESSFULLY INSTALLED MPE
then you should be able to use mpecc
and jumpshot
from now on. Note that jumpshot
seems to have a habit of creating its main window title bar underneath the top panel, so that you can't move it. A solution is to right-click the task bar entry for Jumpshot, and click "Move".
Notes:
- You need to be online for the entire run time of the script.
- Depending on your machine and internet connection, the script may take around half an hour to finish. (10 minutes just now on my laptop, but that's a fairly fast machine on a fast connection)
- It is best to leave the computer alone while it is processing the script.
- This script is only intended for the class virtual machine, and even then you are using it at your own risk. I highly recommend that you use Virtualbox to create a system restore point before you attempt the installation, in case something goes awry. Do not attempt to use this on a Mac or another Linux machine.