SORT Statement

ACUCOBOL-GT includes a sorting utility called AcuSort. See Debugger and Utilities in the ACUCOBOL-GT User's Guide for details on this utility. There is also a runtime configuration variable that instructs the runtime to use the system's Quicksort algorithm (if present) instead of the built-in algorithm specified by the SORT statement. See USE_SYSTEM_QSORT in Appendix H for more information.

Note: This manual entry includes code examples and highlights for first-time users following the General Rules section. In the highlights list, item four discusses ways to improve SORT performance.

General Format

SORT sort-file 

  { KEY AREA IS key-table }
  { ON {ASCENDING } KEY {key-name} } ...
       {DESCENDING}

  [ WITH DUPLICATES IN ORDER ]

  [ COLLATING SEQUENCE IS alpha-name ]

  { INPUT PROCEDURE IS proc-name  }
  { USING {in-file} ...           }

  { OUTPUT PROCEDURE IS proc-name } 
  { GIVING {out-file} ...         }

Note that proc-name has the following format:

  start-proc [ {THRU   } end-proc ]
               {THROUGH}

Syntax Rules

sort-file names a sort file described by an SD entry in the Data Division.
key-table must name a data item that is not located in the record for sort-file. key-table may not be subordinate to an OCCURS clause, nor may it be reference modified.

key-table must reference a data item whose size is an even multiple of 7. key-table is processed as if it had the following structure:

01 KEY-TABLE.
   03 SORT-KEY OCCURS N TIMES.
      05 KEY-ASCENDING PIC X COMP-X.
      05 KEY-TYPE PIC X COMP-X.
      05 KEY-OFFSET PIC XX COMP-X.
      05 KEY-SIZE PIC XX COMP-X.
      05 KEY-DIGITS PIC X COMP-X.

Typically, programs will declare key-table with a similar format.

key-name is a data item in the record description associated with sort-file. It may not be subordinate to an OCCURS clause, nor may it be a group item containing variable occurrence data items. It may not be reference modified. The maximum number of keys allowed is 23.
alpha-name is an alphabet-name defined in the SPECIAL-NAMES paragraph of the Environment Division.
in-file and out-file are files described by FD entries in the Data Division. They may not be sort files. The maximum number of input and ouput files is 25.
start-proc and end-proc are paragraph or section names in the Procedure Division.
A SORT statement may not appear in Declaratives or in the input or output procedure of a SORT or MERGE statement.
If sort-file contains variable length records, in-file records must not be smaller than the smallest record in sort-file nor larger than the largest. If sort-file contains fixed length records, in-file records may not be larger than the size of sort-file’s records.
If out-file contains variable length records, sort-file records must not be smaller than the smallest record in out-file nor larger than the largest. If out-file contains fixed length records, sort-file records may not be larger than the size of out-file's records.
If sort-file contains more than one record description, key-name need appear in only one of them. The character positions referenced by key-name are used as the key for all the file’s records.
If out-file is an indexed file, the first key-name must be ASCENDING and must specify the same character positions in its record as the primary record key for out-file.
THRU is an abbreviation for THROUGH.

General Rules

The SORT statement sorts records received from the INPUT PROCEDURE or found in the in-files. It then either makes these sorted records available to the OUTPUT PROCEDURE or writes them to each out-file.
Sort records must be at least six bytes in size.
If sort-file contains fixed length records, any shorter in-file records are space-filled on the right to match the record size.
If out-file contains fixed length records, any shorter sort-file records are space-filled on the right to match the record size.
The first key-name is the major key, and the next key-name is the next most significant key. This pattern continues for each key-name specified.
The ASCENDING phrase specifies that key values are to be ordered from lowest to highest. The DESCENDING phrase specifies the reverse ordering. Once ASCENDING or DESCENDING is specified, it applies to each key-name until another ASCENDING or DESCENDING adjective is encountered.
Use the KEY AREA option when you do not know the specifics of the sort key until the program is run. You can use this to allow users to enter sort key specifications, typically in conjunction with some form of data dictionary.
Your program must fill in a table of information that describes the sort keys. This table, key-table, should have the format described by Syntax Rule 3 above. The number of sort keys is determined by the number of occurrences in the table. The keys are listed in order of precedence: table entry 1 describes the highest precedence key, table entry 2 the second highest, and so on. If you need to process a variable number of keys, use a variable-size table (by using OCCURS DEPENDING ON).

For each key, you must specify the following information:

KEY-ASCENDING:	This should be 0 or 1. Enter 1 to have an ascending sort sequence, 0 for descending.
KEY-TYPE:	Describes the underlying data format. The allowed values are listed in the next rule.
KEY-OFFSET:	Describes the distance (in standard character positions) from the beginning of the sort record to the beginning of the key field. The first field in a sort record is at offset 0.
KEY-SIZE:	Describes the size of the key field in standard character positions.
KEY-DIGITS:	This is used only for numeric keys. It describes the number of digits contained in the key (counting digits on both sides of the decimal point).

The KEY-TYPE field uses a code to describe the type and internal storage format of the data item. Select from the following values:

0	Numeric edited
1	Unsigned numeric (DISPLAY)
2	Signed numeric (DISPLAY, trailing separate)
3	Signed numeric (DISPLAY, trailing combined)
4	Signed numeric (DISPLAY, leading separate)
5	Signed numeric (DISPLAY, leading combined)
6	Signed COMP-2
7	Unsigned COMP-2
8	Unsigned COMP-3
9	Signed COMP-3
10	COMP-6
11	Signed binary (COMP-1, COMP-4, COMP-X)
12	Unsigned binary (COMP-1, COMP-4, COMP-X)
13	Signed native (COMP-5, COMP-N)
14	Unsigned native (COMP-5, COMP-N)
15	Floating point (FLOAT, DOUBLE)
16	Alphanumeric
17	Alphanumeric (justified)
18	Alphabetic
19	Alphabetic (justified)
20	Alphanumeric edited
21	Not used
22	Group

This coding is the same one used by the C interface, and is also used by Acu4GL to interface to relational DBMSs. When specifying the key type, you may safely use "alphanumeric" for all nonnumeric keys. (The sort rules are the same for each of these types). For numeric data, however, you must specify the correct type or you may get sorting errors.

The results are undefined if you provide invalid data in the key-table. If you fail to specify any keys (by specifying a table whose size is zero), you receive a file error on sort-file. Under the default file status codes, this is file error 94 with a secondary status of 63.
For nonnumeric keys, the COLLATING SEQUENCE phrase establishes the ordering. If this phrase is omitted, the NATIVE collating sequence is used. For numeric keys, the ordering is specified by the algebraic value of the key.
The DUPLICATES phrase affects the return order for records whose key-name values are equal.
1. When there is a USING phrase, the return order is the same as the order of appearance of in-file names in the SORT statement. Within a given in-file, the order is that in which the records are accessed from that file.
2. When there is an INPUT PROCEDURE, the return order is the same as the order in which records were released. If the DUPLICATES phrase is not used, the return order for records with equal key values is unpredictable.
The execution of a SORT statement consists of three distinct phases. These are:
1. Records are made available to the sort-file. This is achieved either by executing RELEASE statements in the input procedure or by implicit execution of READ statements for each in-file. When this phase starts, in-file must not be open. When it finishes, in-file will not be open.
2. The sort-file is sequenced according to the KEY phrase and the DUPLICATES clause. No processing of in-files or out-files takes place during this phase.
3. The records in sort-file are made available in sorted order. The sorted records are either written to the out-files or are made available to an output routine through execution of a RETURN statement. When this phase starts, out-file must not be open. When it finishes, out-file will be closed.
If the INPUT PROCEDURE phrase is used, the named procedure is executed by the SORT statement according to the rules for the PERFORM verb. This procedure must make records available to the input phase of the sort operation by executing RELEASE statements. When this procedure returns, the sort operation proceeds to the sequencing phase. The range of the input procedure may not cause the execution of a MERGE, RETURN, or SORT statement.
If the USING phrase is specified, all records in each in-file are transferred to sort-file. For each in-file, the following actions occur:
1. The file is opened as if it were the object of an OPEN INPUT statement with no options.
2. The records are obtained and released to the sort operation. Each record is obtained as if a READ statement with the NEXT and AT END phrases had been executed. For relative files, the RELATIVE KEY data item is undefined at the end of this phase.
3. The file is closed as if it were the object of a CLOSE statement with no options. This occurs prior to the sequencing of sort-file.
  These implicit functions are performed such that any associated USE procedures are executed. These USE procedures must not access in-file or its record area.
If an output procedure is specified, control passes to it after the sort-file has been sequenced. Control passes to the output procedure according to the rules of the PERFORM statement. The output procedure must execute RETURN statements to retrieve the sorted records. When the output procedure returns, the SORT statement terminates and control passes to the next executable statement. The range of the output procedure must not execute any MERGE, RELEASE, or SORT statements.
If the GIVING phrase is used, all the sorted records are written to each out-file. For each of these files, the following steps occur:
1. out-file is opened as if it were the object of an OPEN OUTPUT statement with no options.
2. The sorted records are returned and written to the file. The records are written as if a WRITE statement without any options had been executed. For a relative file, the value of the RELATIVE KEY data item is updated to reflect the record number written.
3. The file is closed as if it were the object of a CLOSE statement without any options.
These implicit functions are performed such that any associated USE procedures are executed. Such a USE procedure may not refer to out-file or its record area. On the first attempt to write beyond the externally defined boundaries of the file, any applicable USE procedure is executed. If control is returned from that USE procedure, or no USE procedure is applicable, the processing of that out-file is terminated.
If the SORT statement is in a fixed segment, the range of any input and output procedures must be contained completely in the fixed segments and no more than one independent segment. If the MERGE statement is in an independent segment, the range must be completely contained in the fixed segments and the same independent segment.
The SORT statement updates the value of the sort-file's FILE STATUS data item.
Only one SORT may be active at a time. See also CANCEL SORT.
If a SORT statement is executed in a wrong context, the runtime displays the error Illegal SORT. This error belongs to the class of intermediate runtime errors that, upon occurrence, call installed error procedures. See CBL_ERROR_PROC for details.
For compatibility with other COBOLs, ACUCOBOL-GT includes special registers known as SORT-RETURN and SORT-MESSAGE. SORT-RETURN can be used for two purposes.
1. To determine the status of a SORT that's just finished. You can determine the success or failure of a SORT by examining this variable after the SORT returns. A value of 0 indicates success, and a non-zero value indicates failure.
2. To interrupt a SORT that is currently running. By setting this variable in an input or output procedure, you stop SORT processing immediately after the next RELEASE or RETURN statement is performed. By setting this variable in a DECLARATIVES paragraph (if you are not using input or output procedures), you stop SORT processing immediately after the next implicit RELEASE or RETURN is performed.
The special register SORT-RETURN is of type SIGNED-INT. Please note that this register is primarily for compatibility purposes, and there are better ways to perform these functions in ACUCOBOL-GT. For instance, to get status on a SORT, use the FILE STATUS variable of the SORT file. This gives more information than just success or failure. And if you are using input procedures, you can halt a SORT more simply by returning from the procedure as if you had reached the end of file.

When compiling for IBM compatibility (-Cv), the SORT-MESSAGE behaves just like it is declared in every program, for example:
```
01  SORT-MESSAGE PIC X(8) EXTERNAL.
```
This variable is used in mainframe environments to help control the SORT operation. In ACUCOBOL-GT, the variable has no particular effect.

The SORT statement can be used to sort elements of a working-storage table. The syntax is:

SORT data-name-2 [ ON ASCENDING/DESCENDING KEY data-name-1 ... ]
   [ WITH DUPLICATES IN ORDER ]
   [ COLLATING SEQUENCE IS alphabet-name ]

Code Examples

SORT PRODUCT-SORT-FILE             |temporary SD file
   ON ASCENDING KEY MODEL-TYPE,    |major sort key
                    MODEL-NUMBER   |minor sort key
   USING ATHLETIC-SHOES-LIST,      |input data file
         DRESS-SHOES-LIST          |input data file
   GIVING PRODUCT-LIST.            |permanent output data file

(An extended version of this example appears after the Highlights for First-Time Users section.):

SORT PRODUCT-SORT-FILE                 |temporary SD file
*duplicates sorted in the order acquired
   ON ASCENDING  KEY MODEL-TYPE        |major sort key
   ON DESCENDING KEY MODEL-NUMBER      |minor sort key
   WITH DUPLICATES IN ORDER
   INPUT  PROCEDURE IS WEED-PRODUCT-LIST
   OUTPUT PROCEDURE IS UPDATE-PRODUCT-LIST.

Highlights for first-time users

SORT is used to order records according to a set of key fields (sort keys). Records may be stored in sequential, relative, or indexed files, or records may be acquired by use of an INPUT PROCEDURE. Once ordered, the record set may be further processed by use of an OUTPUT PROCEDURE, or the records may be written directly to the named output file(s).
SORT is most often used to order records stored in disk files. However, by using an INPUT PROCEDURE you can acquire records from other input sources such as output from batch processes, internal application data structures, or screen input.
SORT creates a special temporary disk file (the sort file) as a work space for collecting, sorting, and holding ordered records. The sort file is defined by an SD entry in the DATA DIVISION. The sort file record definition must immediately follow the SD entry and must include definitions for each sort key used, except when the KEY AREA phrase is used. You can place temporary files used by the SORT verb in a specified directory. The sort file is removed when the SORT statement completes. See SORT_DIR for more information.
Runtime performance: Most SORT procedures involve the reading, sorting, and writing of records stored in disk files. These disk I/O processes can be relatively slow and, therefore, the SORT process can take a lot of time. However, you can tune performance. To get the best runtime performance, give the process as much memory as possible, without adversely affecting other processes running on the system. Ideally, you should allocate enough memory, using the SORT_MEMORY configuration variable, so that all records fit into the allocated memory; this way, the records are sorted and moved to the output phase without requiring the use of temporary files. Determining the optimal value depends on the number and size of the records being sorted, the amount of available memory, and the needs of other processes on the system. Some experimentation may be necessary.

For larger SORT operations, you will more than likely not be able to allocate enough memory in which all records can fit, and so the SORT operation will use temporary sort files. Use the A_SORT_REGIONS, A_SORT_REGIONS_FINAL and A_SORT_FILE_MEMORY configuration variables to fine-tune the use of temporary files.

By default, the SORT routine uses a built-in sort function. Alternatively, if your system has a qsort() function, you can specify its use by setting the runtime configuration variable called USE_SYSTEM_QSORT to the value of 1. Some systems have qsort() functions that perform better than the built-in function. Consider experimenting with this variable to determine if this option yields better performance on your system. Pay particular attention to the number of comparisons done during the sort, which can be seen in the runtime trace output.
The three basic steps of the SORT procedure are:
1. Acquiring and placing the records to be sorted in the sort file:
  When the USING phrase is used, SORT opens each named input file, reads the data records, one at a time, into the sort file and closes the input file. Input files must not be open before the SORT statement begins.
  
  When an INPUT PROCEDURE is used the RELEASE verb is used to pass records to the sort file. If records are acquired from disk files, it is the responsibility of the input procedure to open, read, process, RELEASE each individual record, and close the files. For more information, see the RELEASE StatementRELEASE Statement .
2. Sorting the records:
  Using the sort file, and a set of temporary files, records are sorted according to the key phrase, the DUPLICATES clause, and the COLLATING clause. See SORT_DIR and SORT_MEMORY for more information.
3. Disposition of the sorted records:
  When the GIVING phrase is used, the sorted records are written to the named permanent output file(s).
  
  When an OUTPUT PROCEDURE is used, the sorted records are made available to the output procedure for processing and writing to a permanent file(s). The output procedure uses the verb RETURN to acquire the ordered records from the sort file. It is the responsibility of the output procedure to open, write, and close the output file(s). For more information see the description of the RETURN Statement.
A SORT statement may not appear in a DECLARATIVES section or in an INPUT or OUTPUT PROCEDURE that is part of a SORT statement (nesting of SORT statements is not permitted).
The KEY AREA phrase is a means for defining the sort keys at runtime, as the application is running. When you use KEY AREA, it is not required that the sort file record descriptor contain entries for potential sort keys. Definition of the sort key(s) in the sort file is handled internally by the SORT routine using the key table. See syntax rules 2 and 3 and general rules 6 through 10.
If the KEY AREA phrase is not used, the sort keys must be defined in the record description of the sort file.
Use of INPUT PROCEDURE or OUTPUT PROCEDURE requires that all file I/O operations and record disposition be handled by the input or output procedure. This means that the input and output procedures must explicitly perform the OPEN, READ, RELEASE (input), RETURN (output), WRITE, and CLOSE actions. As with any I/O management, the procedure should consider and account for the handling of all I/O related errors.
Use the DUPLICATES phrase when you want duplicate records to be sequenced in the same order that they are read in or RELEASEd. Duplicate records are those that have identical key values. In the absence of the DUPLICATES phrase sequencing of duplicate records is not predictable (see General Rule 12).
Use the COLLATING SEQUENCE phrase to alter the ordering of nonnumeric keys. The named collating sequence must be defined in the SPECIAL-NAMES paragraph of the ENVIRONMENT DIVISION. In the SPECIAL-NAMES paragraph the user may define a unique character order, or the user may select one of the four predefined character sequences: STANDARD-1, STANDARD-2, NATIVE, and EBCDIC. See Special-Names Paragraph for more information.
If no COLLATING SEQUENCE phrase is used, the default collating sequence is used. The default collating sequence is whatever is native to the operating system (usually the same as the predefined type NATIVE).
Use the STATUS variable to hold the execution status of the SORT operation. The status variable is named in the SELECT/ASSIGN phrase of the FILE-CONTROL paragraph of the INPUT-OUTPUT SECTION. See Special-Names Paragraph for more information.
For a complete list and description of file status codes, see Appendix E, in ACUCOBOL-GT Appendices.
To specify the disk directory in which SORT will place any temporary files, set the SORT_DIR runtime configuration variable, located in the runtime configuration file.

For simplicity, only one input file will be used.

IDENTIFICATION DIVISION.
PROGRAM-ID.  SAMPLE-FILE-SORT.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
*SORT input file
SELECT ATHLETIC-SHOES-LIST
      ASSIGN TO ....
*SORT output file
   SELECT PRODUCT-LIST
      ASSIGN TO ....
*sort file (SD)
   SELECT PRODUCT-SORT-FILE
      ASSIGN TO ....
DATA DIVISION.
FILE SECTION.
FD  ATHLETIC-SHOES-LIST.
01  A-SHOE-RECORD            PIC X(38).
FD  PRODUCT-LIST.
01  B-SHOE-RECORD            PIC X(38).
SD  PRODUCT-SORT-FILE.
01  SORT-DATA.
    05  MODEL-NAME         PIC X(10).
    05  MODEL-TYPE         PIC X(3).
    05  MODEL-NUMBER       PIC X(3).
    05  STOCK-NUMBER       PIC X(7).
    05  DESIGN-YEAR        PIC 99.
    05  UNIT-COST          PIC 999V99.
    05  UNIT-PRICE         PIC 999V99.
    05  FACTORY-NUM        PIC 999.
WORKING-STORAGE SECTION.
01   FLAGS.
     05   SHOE-LIST-EMPTY  PIC X  VALUE "N".
          88 NO-MORE-SHOE-RECORDS VALUE "Y".
     05   SORT-FILE-EMPTY  PIC X  VALUE "N".
          88 NO-MORE-SORT-RECORDS VALUE "Y".
01   HONG-KONG-NUMBER      PIC 99.
01   TAIWAN-NUMBER         PIC 99.
...
PROCEDURE DIVISION.
PRODUCT-LIST-SORT.
*temporary SD file used by sort
   SORT PRODUCT-SORT-FILE
  *major sort key
     ON ASCENDING  KEY MODEL-TYPE
*minor sort key
     ON DESCENDING KEY MODEL-NUMBER
 *duplicates sorted in the order acquired
      WITH DUPLICATES IN ORDER
         INPUT  PROCEDURE IS WEED-PRODUCT-LIST
         OUTPUT PROCEDURE IS UPDATE-PRODUCT-LIST.

WEED-PRODUCT-LIST SECTION.
OPEN-LIST-FILE.
   OPEN INPUT ATHLETIC-SHOES-LIST.
   PERFORM WEED-LIST 
      UNTIL NO-MORE-SHOE-RECORDS.
   CLOSE ATHLETIC-SHOES-LIST.
   GO TO EXIT-WEED-PRODUCT-LIST.
   
WEED-LIST.
   READ ATHLETIC-SHOES-LIST NEXT
      AT END MOVE "Y" TO SHOE-LIST-EMPTY
      NOT AT END
*stock numbers beginning with "X" are obsolete
*do not RELEASE
        IF STOCK-NUMBER(1:1) = "X" THEN 
           NEXT SENTENCE
        ELSE
*otherwise release the record to SORT
           RELEASE SORT-DATA
        END-IF.

EXIT-WEED-PRODUCT-LIST.
   EXIT.

UPDATE-PRODUCT-LIST SECTION.
CREATE-PRODUCT-LIST.
   OPEN OUTPUT PRODUCT-LIST.
   PERFORM UPDATE-RECORD 
      UNTIL NO-MORE-SORT-RECORDS.
   CLOSE PRODUCT-LIST.
   GO TO EXIT-UPDATE-PRODUCT-LIST.
   
UPDATE-RECORD.
   RETURN PRODUCT-SORT-FILE INTO SORT-DATA
      AT END MOVE "Y" TO SORT-FILE-EMPTY
      NOT AT END
         IF FACTORY-NUM = HONG-KONG-NUMBER THEN
             MOVE TAIWAN-NUMBER TO FACTORY-NUM
         END-IF
         WRITE B-SHOE-RECORD FROM SORT-DATA.

EXIT-UPDATE-PRODUCT-LIST.
   EXIT.