China Data Retrieval: A Method for Computer-Assisted Indexing of Translated Mainland Chinese Material

This report has yet to be scanned by Contrails staff

Report Number: RM-6332-PR
Author(s): Robinson, Thomas W.
Corporate Author(s): The Rand Coporation
Date of Publication: 1970-12
Contract: F44620-67-C-0045
DoD Task:
Identifier: AD0718403

Abstract:
An acute need exists for a detailed index to the heavy and continuing flow of materials currently translated from Mainland China sources. As with many foreign sources, these materials come in many small fragments, and their timeliness makes it important to arrange and index them quickly in order to be of maximum use to the analyst. Only in the last few years have computer hardware and programs been brought together for the preparation of such a social science index at less than prohibitive cost. This Memorandum reports on a pilot project to apply these computer techniques to the construction of such an index and to estimate costs of production on a regular, large-scale basis.A useful scheme for indexing translation series must have the following attributes: high user acceptability, low cost in each phase of production; an easily learned and efficient procedure; a sufficiently sophisticated program; and cost-effective equipment. Before the pilot project began, a number of operational choices were made. (1) A predefined set of index categories was rejected in favor of an open-ended subject list drawn directly from the text. (2) To maximize efficiency of off-line inputs and to reduce costs, the IBM MT/ST (Magnetic Tape/Selectric Typewriter) was deemed the best available input device. (3) An internal Rand program, "Quester," was chosen and modified to handle the data. (4) The IBM 360/65 computer was selected because it was immediately available at Rand and is a widely available, large-capacity, high-speed machine. (5) A computer-driven phototypesetting capability was built into the system.The project was divided into two parts. The first introduced enough dtaa into the system to provide material for initial checking of all operations. It was necessary to convert the Quester program, to the desired format, make it compatible with the 360/65 computer, write various peripheral programs, and gain experience with the MT/ST input equipment and the RCA Spectra 70 phototypesetting output program. The second was devoted to a trial run of one month of data from the translation series, Foreign Broadcast Information Service -- Communist China, Daily Report (FBIS-CC) (the 21 issues for January 1969) to ascertain costs, provide an adequate presentation, and work out standard procedures for more than one indexer. The results demonstrated the feasibility and desirability of the integrated scheme, and the costs for large-volume daily output were estimated to fall within the desired range.The original 1064 pages of raw material were indexed in 173 pages of printout, including all input data and an alphabetical index of subject categories. Thus, one page of the index covers about six pages of original translation. These figures demonstrate that an index of either the FBIS-CC or a larger set of translations can be produced in a volume acceptable to users.Conclusions, based on production experience, include the following. (1) If he has had previous experience working with translated Chinese Communist materials, a potential indexer can perform well after about one week's practice. Similarly, in a week, a trained MT/ST typist reduced keyboarding errors to a satisfactory level. (2) A small number of general groupings emerged as potential subject-category divisions for the index. Modifying the format to incorporate these general headings might increase the index's convenience to users. (3) The success of an expanded version of the pilot program will depend on the sources and level of funding. It would probably require several months of lead time before the program could be fully operational, i.e., produce an index on a daily basis. (4) Once this point has been reached, the scope of service could be broadened. Quester can do Boolean searches, can accept extracts and abstracts, and -- with additional modifications -- can perform logical operations. These capabilities, if exploited, could make the system an active and powerful research tool in its own right. The more distant future might bring a remote access retrieval and display capability.Yearly costs are estimated for the indexing of three publications: FBIS-CC alone ($79,000); all six FBIS Daily Reports ($588,000); and all English-language translation series from Mainland China except the New China Agency's daily English output ($288,000). In all three cases, figures derived from the pilot project were extrapolated in a straight line to estimate the yearly costs. The annual cost for a single subscription, on the basis of a 1000-copy print run, would thus be roughly $79, $588, and $288 respectively, for the three publication alternatives. This probably makes the index a library item. A variable subscription rat is recommended, shifting some of the cost from individual to institutional subscribers, to government subsidy, or to other outside financial support.Four possible institutional locations of the indexing service are compared: U.S. government agency, university, not-for-profit research corporation, and profit-making research corporation. Each has built-in advantages and drawbacks. However, for economic reasons, the government locus might be considered first.While this experiment demonstrated the general validity of computer-indexing of Chinese materials translated into English, the results should not be viewed as suggesting that the IBM 360/65 or the Quester program should necessarily be used in a full-scale implementation. Technological growth in computer systems and related programs is such that more advanced techniques should be examined before any system is chosen for implementation.

Other options for obtaining this report:

Via the Defense Technical Information Center (DTIC):
A record for this report, and possibly a pdf download of the report, exists at DTIC

Via National Technical Report Library:
The NTRL Order Number for this report is: AD718403
A record for this report, and possibly a pdf download of the report, exists at NTRL

Indications of Public Availability
No digital image of an index entry indicating public availability is currently available
There has been no verification of an indication of public availability from an inside cover statement