LIMS-III Tutorials:

  1. What is the meaning of PMG? (or how to process multiple datasets in parallel):
    In order to accommodate multi-wavelength data, at the end of 2014 we switched to a composite job model in LIMS. Under that model jobs submitted in aggregate are submitted as a single job, and can be processed in parallel. This was necessary because multi-wavelength data sometimes have several hundred triples that will be part of a submission (each channel may be measured at 300 or more wavelengths). Submitting that many individual jobs to any of the XSEDE clusters would have overwhelmed our allocation limit (50 jobs max on some resources). So we decided to submit them all as a single job and process the individual datasets under parallel master groups (PMGs).
  2. How many PMGs should I use?
    By using PMGs you can have up to 32 parallel threads that will process off all the datasets in your submission simultaneously. All your submitted datasets are subdivided among the PMGS. When you select parallel processing you can select more than 1 PMG (but you don't have to). If you dont, you will be assigned only one PMG, and all jobs will be serialized inside that single PMG. If you have 4 datasets to process, you could pick 4 PMGs and have them all processed in parallel. Of course there is a limit, since each additional parallel job requires 32 or 36 more CPUs from the cluster. So we limited it up to 32 PMGs. If you have more than 32 datasets, they will be serialized within the PMG. Here is the flipside: If you request more CPUs than needed for a single jobs your jobs may complete quicker, but the additional block of CPUs requested may put you at disadvantage in terms of when your job will start when there is a lot of usage. So you can scale your request to what you think will give you the best possible return. Afterall, what good is it when your job completes in 2 minutes when you have to wait a day before the block of 1100+ CPUs can be allocated. It may be better to only ask for 128 processors to get an earlier start time and have the total job take an hour or so.
  3. What PMG setting minimizes my runtime?
    Unfortunately, this type of gaming the system is difficult to predict so your experience may vary depending on how many other people are using the cluster. In my experience, Sunday early mornings almost always get me processing immediately, also during the Superbowl things work surprisingly quickly :-)