SS01—Specifications for random sampling and randomization
Category: Statistical method
Specification: SS01 (rev. 1)
Effective date: 20081110
Supersedes: SS01
Table of contents
 1.0 Scope
 2.0 Authority
 3.0 Terms, definitions, symbols, and abbreviations
 4.0 Pseudoindependent random sampling computer algorithms
 5.0 Random sampling methods
 6.0 Revision
 Appendix A—(informative)
 Appendix B—(informative)
 Appendix C—(informative)
1.0 Scope
1.1 These specifications define algorithms for random sampling and randomization and are applicable whenever regulations or other specifications reference them for the purposes of random sampling or randomization.
1.2 These specifications are applicable to such situations as:
 acceptance sampling of discrete units presented for inspection in lots;
 sampling for survey purposes;
 auditing of quality management system results; and,
 selecting experimental units, allocating treatments to them, and determining evaluation order in the conduct of designed experiments.
1.3 These specifications also include information to facilitate auditing or other external review of random sampling or randomization results where this is required by Measurement Canada or quality management personnel in accredited organizations.
1.4 No normative references are applicable to these specifications. For informative references, refer to the Bibliography in the appendix.
2.0 Authority
These specifications are issued under the authority of section 19 of the Electricity and Gas Inspection Regulations.
3.0 Terms, definitions, symbols, and abbreviations
3.1 Terms and definitions
3.1.1 Lot
Definite part of a population (3.1.2) constituted under essentially the same conditions as the population with respect to the sampling (3.1.8) purpose.
Note: The sampling purpose may, for example, be to determine lot acceptability, or to estimate the mean value of a particular characteristic.
3.1.2 Population
Totality of items under consideration.
3.1.3 Pseudoindependent random sampling
Sampling (3.1.8) where a sample (3.1.7) of n sampling units (3.1.9) is taken from a population (3.1.2) in accordance with a table of random numbers or a computer algorithm designed such that each of the possible combinations of n sampling units has a particular probability of being taken.
3.1.4 Random sample
Sample (3.1.7) selected by random sampling (3.1.5).
3.1.5 Random sampling
Sampling (3.1.8) where a sample (3.1.7) of n sampling units (3.1.9) is taken from a population (3.1.2) in such a way that each of the possible combinations of n sampling units has a particular probability of being taken.
3.1.6 Randomization
Process by which a set of items are set into a random order.
Note: If, from a population (3.1.2) consisting of the natural numbers 1 to n, numbers are drawn at random (i.e. in such a way that all numbers have the same chance of being drawn), one by one, successively, without replacement, until the population is exhausted, the numbers are said to be drawn "in random order".
If these n numbers have been associated in advance with n distinct units or n distinct treatments that are then rearranged in the order in which the numbers are drawn, the order of the units or treatments is said to be randomized.
3.1.7 Sample
Subset of a population (3.1.2) made up of one or more sampling units (3.1.9).
3.1.8 Sampling
Act of drawing or constituting a sample (3.1.7).
3.1.9 Sampling unit
One of the individual parts into which a population (3.1.2) is divided.
3.1.10 Sampling without replacement
Sampling (3.1.8) in which each sampling unit (3.1.9) is taken from the population (3.1.2) once only without being returned to the population.
3.1.11 Seed
Numerical value or set of values used to initialize a pseudoindependent random sampling (3.1.3) algorithm or to establish a starting point in a table of random numbers.
3.1.12 Simple random sample
Sample (3.1.7) selected by simple random sampling (3.1.13).
3.1.13 Simple random sampling
Sampling (3.1.8) where a sample (3.1.7) of n sampling units (3.1.9) is taken from a population (3.1.2) in such a way that all possible combinations of n sampling units have the same probability of being taken.
3.2 Symbols and abbreviations
The key symbols and abbreviations used in these specifications are as follows:
 mod
 modulo operator (a mod b = a − b ⌊a / b⌋)
 N
 lot size
 n
 sample size
 n_{i}
 size of the i^{th} sample
 U
 uniformlydistributed random real variable on the open range (0, 1)
 x_{i}
 the i^{th} value of the variable x
 ⌊z⌋
 floor function of z (returns the integer portion of real value z)
4.0 Pseudoindependent random sampling computer algorithms
4.1 Overview
4.1.1 These specifications adopt a specific system of algorithms developed in bibliographic references [1, 5, and 8]. The algorithms have been designed to possess the mathematical and statistical properties required for random sampling, as well as to be portable with respect to implementation in different programming languages on different computer platforms and to facilitate verification and auditing of the selected sample values, which might be required for regulatory purposes.
4.1.2 The system of algorithms involves two major subsystems:
 an optional initialization algorithm that automatically generates a quasirandom seed integer based on elapsed time from a reference date; and,
 a random number generator.
4.1.3 For verification or auditing purposes, the optional initialization algorithm mentioned in 4.1.2 a) and described in 4.2 would be bypassed with a manuallyentered seed value. This value needs to be within the integer range from 1 and 2 147 483 398 inclusive. A copy of this input value is saved for records purposes when required. However, in general usage for quality control and designed experiment applications, there should be infrequent need to bypass the option of automatic random seed generation, which should be the default option in practice.
Note: The presentations of the steps of the algorithms in this clause have been kept in a more mathematical format to aid in programming.
4.2 Initialization algorithm
4.2.1 The initialization algorithm consists of:
 an elapsed time computation algorithm, referenced to a fixed past date and time; and,
 a random number generation algorithm based on the uniform distribution, called a random number of times based on the output of item a) above, to obtain a random seed based on the timebased input.
4.2.2 The following algorithm determines the number of seconds that has elapsed since 20000101 00:00:00 to the current date and time:
 Capture the computer system's date and time to a string variable, save a copy of the variable for records purposes, and then parse the string into its time components (i.e. year, month, day, hour, minute, and second).

Compute the number of fully elapsed days d_{e} since the reference time point, using the current date's full fourdigit year y, month m_{1}, and day d numerical values processed as follows:
If m_{1} < 3 then let m_{1} = m_{1} + 12 and let y = y − 1
d_{e} = d + ⌊(153 m_{1} − 457) ÷ 5⌋ + 365 y + ⌊y ÷ 4⌋ − ⌊y ÷ 100⌋ + ⌊y ÷ 400⌋ − 730 426
Note: The equation for d_{e} may be slightly simplified for calendar years up to and including 2099 by replacing the terms following ⌊y ÷ 4⌋ by "− 730 441".

Compute the total number of seconds s_{e} elapsed since the reference date using the quantity obtained in step b) and the time of day (in 24hour "hh:mm:ss" format) captured in the string variable in step a) in accordance with the following equation:
s_{e} = 86400 d_{e} + 3600 h + 60 m_{2} + s
where h, m_{2}, and s are the hours, minutes, and seconds respectively.
Note: Some programming languages have builtin functions to perform the calculation of s_{e} directly. Such intrinsic functions need to be validated before use, to ensure the effects of leap years and daylight saving time are properly handled.
 The value resulting from step (c) is the initializing seed for the random seed generator and is used to obtain the final seed. A copy of this value is saved to a separate variable for records purposes when required.

The number of times j that the subsequent random number generator is to be called is a random integer between 1 and 100 inclusive, based on the two least significant digits of the value obtained in step (c) increased by 1, which may be expressed as follows:
j = s_{e} − 100 ⌊s_{e} ÷ 100⌋ + 1
4.2.3 The random number generator for the automatic seed generation (initialization function) algorithm takes the form of the linear congruential recurrence relation:
x_{i}_{ + 1} = 40 692 x_{i} mod 2 147 483 399
which can be implemented on computers capable of handling 32bit integers via the following steps:
 k = ⌊x_{i} ÷ 52 774⌋
 x_{i}_{ + 1} = 40 692 (x_{i} − 52 774 k) − 3 791 k
 If x_{i}_{ + 1} < 0 then let x_{i}_{ + 1} = x_{i}_{ + 1} + 2 147 483 399
4.2.4 Generate the seed to the random sampling algorithm by assigning the result from 4.2.2 (c) to x_{i} and then calling the formula in 4.2.3 j times per step 4.2.2 (e), replacing x_{i} with x_{i}_{ + 1} each time until the required number of calls are made.
4.2.5 The final value of x_{i}_{ + 1} resulting from step 4.2.4 is a random integer between 1 and 2 147 483 398 inclusive and serves as the initial seed to the random sampling algorithm described in 4.3 [in particular, the value y_{i} in step 4.3.6 (b)]. A copy of this value is saved to a separate variable for records purposes when required.
4.3 Random number generation algorithm
4.3.1 The random number generation algorithm consists of:
 a shuffling array that is populated by a uniformdistribution random number generation algorithm; and,
 a combination, uniformdistribution random number generation algorithm.
4.3.2 Create a 32element array A to serve as a means of shuffling the output of the random sampling algorithm.
4.3.3 The following random number generator is used to populate the shuffling array:
x_{i}_{ + 1} = 40 014 x_{i} mod 2 147 483 563
which can be implemented on 32bit computers via the following steps:
 k = ⌊x_{i} ÷ 53 668⌋
 x_{i}_{ + 1} = 40 014 (x_{i} − 53 668 k) − 12 211 k
 If x_{i}_{ + 1} < 0 then let x_{i}_{ + 1} = x_{i}_{ + 1} + 2 147 483 563
4.3.4 Initialize the array A by assigning the result from 4.1.3 or 4.2.5 to x_{i} and then calling the generator given in 4.3.3 (a) 40 times, replacing x_{i} with x_{i}_{ + 1} on each call, discarding the first 8 values, and then assigning each of the remaining 32 output values of x_{i}_{ + 1} to the array in reverse order (i.e. from element 32 down to element 1).
4.3.5 Set element 1 of array A (i.e. A[1]) as the initializing value k to the combination random number generation algorithm.
4.3.6 The combination random number generator for random sample generation takes the form of the following combination of linear congruential recurrence relations and array index determination steps:
 x_{i}_{ + 1} = 40 014 x_{i} mod 2 147 483 563
 y_{i}_{ + 1} = 40 692 y_{i} mod 2 147 483 399
 J = ⌊32 k ÷ 2 147 483 563⌋ + 1
 k = AJ − y_{i}_{ + 1}
 AJ = x_{i}_{ + 1}
 If k < 1 then let k = k + 2 147 483 562
Note: The two random number generators above are those described in 4.2.3 and 4.3.3 (refer to those clauses if 32bit equivalent implementations are required).
4.3.7 The algorithm in 4.3.6 is initialized by setting x_{i} to the final value of x_{i}_{ + 1} from 4.3.4 and setting y_{i} to the value referenced in 4.2.5. The values x_{i}_{ + 1} and y_{i}_{ + 1} serve as the subsequent values of x_{i} and y_{i} for all subsequent calls to the algorithm. A random index J to the shuffling array A is calculated using the value of k (from 4.3.5 initially), and the difference between AJ and y_{i}_{ + 1} is assigned to k, while AJ is updated with x_{i}_{ + 1}. Finally, the value of k is altered if necessary to produce a positive value.
4.3.8 The output of the random sampling algorithm is the value k, which is a random number between 1 and 2 147 483 562 inclusive, scaled as a standard uniformlydistributed real variable U over the range from 0 to 1, exclusive of the endpoint values of this range, as follows: U = k ÷ 2 147 483 563.
4.3.9 The output from 4.3.8 may be scaled as a uniformlydistributed integer variable L over the range from 1 to N, inclusive, as follows: L = ⌊N U⌋ + 1.
4.3.10 To generate a random sample, steps 4.3.6 to 4.3.9 are repeated until the desired number of random values is obtained.
4.4 Audit records
When records are required to be maintained for audit purposes by Measurement Canada or a responsible authority, record the operator identifier, lot identifier, lot size, sample size(s), type of sampling employed, and lists of the units in the lot and in the sample(s).
In addition, with respect to the algorithms, record the manually entered seed per 4.1.3, or if the random seed generator is used then record the:
 computer system's date and time used to compute this initial seed;
 initial seed's value per 4.2.2 (d); and,
 final seed's value per 4.2.5.
5.0 Random sampling methods
5.1 General
5.1.1 This clause provides algorithms for random sampling strategies commonly used in legal metrology work.
5.1.2 Throughout this clause, U is defined as a random real variable, uniformlydistributed in the range from 0 to 1, exclusive of the endpoint values of the range, such as provided by the algorithm in 4.
5.2 Single sampling
A single random sample of n distinct units from a lot of N units is generated without replacement by the following method:
 Generate a random real value U.
 Set L equal to ⌊N U⌋ + 1.
 Verify that the value of L has not been previously generated; if it is distinct, store the value, otherwise discard it.
 Repeat steps (a) to (c) until n different values of L are obtained.
 Optionally, sort the values in ascending order.
Note: If the resulting values of a single sample are not sorted, that sample may be used for sequential sampling inspection by inspecting each unit in the order selected.
5.3 Multiple sampling
Multiple random samples of n_{i} distinct units from a lot of N units are generated without replacement by the following method:
 Generate a single sample of n_{t} distinct units from a lot of N units without replacement, where n_{t} is the total of the individual sample sizes n_{i}, leaving the values in original output order (i.e. unsorted).
 Take the first n_{1} resulting values as the first sample, the next n_{2} resulting values as the second sample, and so forth.
 Optionally, sort the values of each component sample in ascending order.
6.0 Revision
The purpose of Revision 1 is to update the presentation of these specifications in a manner consistent with the ISO international standard that Canada is developing on the subject. This specification does not introduce any substantive changes to the Agency’s web application that has been in use for several years.
Alan E. Johnston
President
Measurement Canada
Appendix A—(informative)
A. Tests of algorithm implementation
A.1 General
This appendix provides information to assist software developers with testing the correctness of their implementations of the random sampling algorithms in the specification.
A.2 Seed calculation tests
For the manually input date and time in the first column of the table below, the seed values in the second and third columns should result at the points in the algorithm indicated by the clause references.
Date and time  Seed 1 4.2.2 (c)  Seed 2 4.2.5 

20090115 16:16:16  285 351 376  1 774 249 844 
20090715 08:08:08  300 960 488  150 009 464 
20100115 16:16:16  316 887 376  1 593 377 912 
20100715 08:08:08  332 496 488  1 451 476 477 
A.3 Tests of component random number generation algorithms
Using initializing seeds of x_{0} = 1and y_{0} = 1, as applicable, for each of the random number generators and calling each generator 10 000 times, produces the following output:
 for x_{i}_{ + 1} = 40 014 x_{i} mod 2 147 483 563 (4.3.3), x_{10 000} = 1 919 456 777;
 for y_{i}_{ + 1} = 40 692 y_{i} mod 2 147 483 399 (4.2.3), y_{10 000} = 2 006 618 587; and,
 for the combined generator with shuffle array (4.3.6), AJ _{J = 10 000} = 1 701 364 455.
A.4 Stepbystep overall test of implementation
Using an initializing date and time of 20090115 16:16:16, the intermediate and final outputs of the algorithms are as follows:
 per step 4.2.2 (b), d_{e} = 3 302
 per step 4.2.2 (c), s_{e} = 285 351 376
 per step 4.2.2 (e), j = 77
 per step 4.2.4 and 4.2.5, result = 1 774 249 844;

per step 4.3.4, the 32 values in array A are:
Table 2 J AJ 1 1 773 883 525 2 1 376 260 681 3 324 244 626 4 616 012 910 5 1 753 573 598 6 238 867 782 7 591 860 039 8 64 148 416 9 12 989 333 10 1 236 571 744 11 150 838 841 12 1 379 547 554 13 1 594 841 833 14 363 535 288 15 643 814 074 16 1 662 338 174 17 1 843 118 480 18 1 301 824 472 19 2 024 723 015 20 1 640 100 338 21 1 715 924 041 22 1 979 383 646 23 1 293 133 612 24 504 407 049 25 925 629 865 26 879 056 303 27 257 361 492 28 1 402 037 236 29 1 031 539 864 30 981 619 081 31 81 117 341 32 2 036 123 857  per step 4.3.5, k = 1 773 883 525;
 per step 4.3.6 (a), x_{i}_{ + 1} = 1 548 645 074;
 per step 4.3.6 (b), y_{i}_{ + 1} = 1 530 261 067;
 per step 4.3.6 (c), J = 27;
 per step 4.3.6 (d), k = −1 272 899 575;
 per step 4.3.6 (e), AJ = 1 548 645 074;
 per step 4.3.6 (f), k = 874 583 987.
Appendix B—(informative)
B. Internet random sampling application
B.1 General
B.1.1 Measurement Canada has developed an online internet application that implements the algorithms defined in this specification. The application is designed to generate one or more pseudoindependent random samples without replacement from a finite lot.
B.1.2 Subject to the provisions of the disclaimer associated with the application, its output may be used to satisfy legal requirements for sample selection and auditability under legislation enforced by Measurement Canada.
B.1.3 The application will also be of assistance to software developers as it can be used to supplement the tests in Appendix A to verify the correctness of user implementations.
B.2 Accessing the application
B.2.1 The application can be accessed from Measurement Canada's Web site.
Appendix C—(informative)
C. Bibliography
 [1] Bays, C. and Durham, S.D. (1976). Improving a Poor Random Number Generator. ACM Transactions on Mathematical Software, Vol. 2, No. 1 (March), pp. 5964.
 [2] ISO 35341:2006, Statistics — Vocabulary and symbols — Part 1: General statistical terms and terms used in probability.
 [3] ISO 35342:2006, Statistics — Vocabulary and symbols — Part 2: Applied statistics.
 [4] ISO 35343:1999, Statistics — Vocabulary and symbols — Part 3: Design of experiments.
 [5] L'Ecuyer, P. (1988). An Efficient and Portable Combined Random Number Generator. Communications of the ACM, Vol. 31, No. 6 (June), pp. 742749, 774.
 [6] Marsaglia, G. (2003). Random Number Generators. Journal of Modern Applied Statistical Methods, Vol. 2, No. 1 (May), pp. 213.
 [7] Park, S.K. and Miller, K.W. (1988). Random Number Generators: Good Ones are Hard to Find. Communications of the ACM, Vol. 31, No. 10 (October), pp. 11921201.
 [8] Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. (1992, 2001). Numerical Recipes in Fortran 77: The Art of Scientific Computing, Second Edition (Volume 1 of Fortran Numerical Recipes), Cambridge University Press, Cambridge, UK.
 Date modified: