U.S. Bureau of the Census Economic Census Staff Washington, D.C. 20233 INTRODUCTION TO EXTRACT (EXTutor 1) Selecting items and records, extracting to a file November 7, 1991 ============================================================================ CONTACT: Bob Marske or (301) 763-1792 Paul Zeisset ABSTRACT: The following was designed as the script for a training presentation. It may also be used as a self-directed tutorial in EXTRACT usage. This tutorial covers item and record selection, extracting data to a .prn file, and loading the results into Lotus 1-2-3. It requires Economic Census CD-ROM 1C, 1D or 1E and EXTRACT 1.3 or later. TIME REQUIRED: 1 hour ============================================================================ The following text assumes that you have already installed EXTRACT. If you have not done so, see the section on installation at the end of this tutorial. This demo also assumes that you have created a subdirectory for temporary files, e.g. \WORK, and that you have made the directory containing EXTRACT the default. If you can type DIR EXTRACT.PRM and a file is listed, i.e., someone has previously saved parameters, type ERASE EXTRACT.PRM. The latter part of the outline, optional, also assumes that you have Lotus 1-2-3 available, and that it is set up to look for data files in a subdirectory C:\123. Instructions for keyboard entry are shown in italics, with the actual keyboard entry shown in bold. Special characters are shown as for the escape key, for the return or enter key, for the page- up key, and for the page-down key. 1 STARTING THE PROGRAM 1. {Type:} EXTRACT {, then press} {again after viewing the opening screen.} 2. DRIVE SELECTION {[Once parameters have been previously saved, this screen will not appear.]} The system prompts you for the drive designation for your CD-ROM, and for a location where it can store temporary files. {Enter the appropriate drive letter for your CD-ROM, e.g., type:} L: {, or whatever is appropriate for your system.} {Type:} C:\WORK {for work space--assuming c:\work already exists.} We have the option to save these parameters for use next time. {Type:} S. The system asks you to give it a name for these parameters. {Type:} EXECON The system will now save a file named EXECON.bat, a name selected to suggest the economic census CD-ROMs. To run EXTRACT with these parameters, type EXECON instead of EXTRACT. When we work with CD-ROMs that do not have built-in auxiliary files, we will use different parameters, and save them under names specific to each disc. 3. SELECT A CATALOG The system then presents you with a menu of all of the types of files--or "catalogs" you have to choose from on this disc. {Page down to see the full list, then page back up to highlight the RC87A3__ catalog, third from the top, then press} . 4. HELP SCREEN The help screen associated with this catalog of files appears. It has two parts: a. A general description of the file, tables it corresponds to, etc. b. A brief description of how key options on the main menu that follows apply to this particular file. You may pull up this help screen at any time from the main menu by pressing elp or . {Press} {to continue.} 2 5. SELECT A FILE Now you get a second menu, listing all of the files of the type you selected. {Move the highlighted bar with the down arrow key to pick a state and press} . The program needs a few moments to read in the file's data dictionary and get itself set up, but then it delivers us the main menu. 6. MAIN MENU The main menu is the control center from which you will operate the rest of the program. At the lower right is the name of the file you are working with. The main menu lists a series of options, which you generally specify in order, but let's jump ahead to Display the file to the screen, as is. {Type:} 6 7. DISPLAY TO SCREEN You are now looking at a large "spreadsheet". You can scroll up and down or side to side to see more data. Features: At the top are cryptic variable names which you may be able to figure out. At the bottom is a more complete description of the one item that the cursor is highlighting. You can scroll the cursor from side to side to see different descriptions. {Press right arrow key repeatedly.} But this is more information than you want to deal with all at once, so let's create a more selective display. {Press:} {to return to the main menu.} 8. SELECT ITEMS {Type:} 1 In order to select items, the program lists variable names, the ones we saw as column headings, along with the more complete descriptors. This information comes from a separate file called the "data dictionary". We can select the variables we want -- with an X. 3 {Cursor down to MSA, type: X, and repeat to mark GEOTEXT, RECTYPE, ESTAB, SALESF, and SALES, and leave the cursor positioned on SALES. (You may also wish to add ESTAB54 and SALES54 if you want to look at food stores. To find these items, you must scroll down through the list with repeatedly or .)} 9. DISPLAYING DEFINITIONS If we aren't sure exactly what a variable means, we can refer to its definition, by pressing D. {Type:} D Let's say that we weren't sure whether sales of restaurants counted tips of waiters and waitresses. Hitting D brings up the same definition as appears in the back of the printed retail trade report. We can scan the text for the appropriate details. {Press} {to return to the Select Items screen.} The dot next to the name of many of the items indicates that a definition is available. {Press} {to return to the main menu.} 10. Redisplay the data. {Type:} 6 This display gives you only the variables you selected. Note that you may also display item definitions from this screen. {Cursor right to the SALES column, and press:} D. You may be able to tell that this screen has a mixture of counties and places within counties. {Cursor over to RECTYPE.} The county records have a rectype of 06, the places have a rectype of 07. Let's say that we want to just select the state and county records. {Press} {to return to the main menu.} 11. SELECT RECORDS {Type:} 2 The record selection screen looks a lot like the variable selection screen, but here you mark variables with an "S" and are indicating the variable you want to restrict--which is RECTYPE (record type). {Cursor down to RECTYPE, and mark it with an S, then press:} {to go on to the next step.} 4 {On the second screen, mark with X both 02 (state) and 06 (county).} {Press} {to return to the main menu.} 12. {To redisplay the data, type:} 6 Note that all you have now are the state and county records. We have boiled down a data set that was too large into a data set we can handle. You could do other kinds of selection, for example, all of the records for a particular MSA or county, but let's move on to some of the other things you can do with the EXTRACT program. {Press} 13. FORMAT OPTIONS {Type:} 5 The program gives us a few format options. {[For a discussion of rowwise and columnar formats, see tutorial #3.]} One of the options is to specify your own heading. The program defaults start us off with a heading taken from the data dictionary, but by the time we have gone through item and record selection, a much more specific heading may be appropriate. {Type:} 3 {(heading) then enter an appropriate heading, for example,} Retail sales by county in [state], November 30, 1991. It is frequently handy to add today's date or your initials to the heading, particularly if you are going to save your output for future use. {Press} {to return to the main menu.} 12. EXTRACTING THE DATA TO A FILE At this point, we could print out our results with option 7. Instead, let's prepare for some real work. Let's say we want to extract the data to a file which Lotus 1-2-3 can read in, so that we can arrange the counties in rank order of sales. {Type:} 8 5 EXTRACT allows us to copy our extracted data set into any of three formats: 1 - dbf We can create another dbase file, just smaller than the original. 2 - prn A comma-delimited file is the kind you want for importing to Lotus 1-2-3. 3 - sdf A fixed-format file looks more like a columnar report. Let's extract the state and county data we have already selected into a file for import into Lotus 1-2-3. {Type:} 2 Let's let the system add headings for us. {Type:} Y The system prompts us to specify a name for our output file and a drive and directory location on our hard disk or floppy. {If you have Lotus 1-2-3, specify a filename in the subdirectory where that program normally looks for data files. If you do not have 1-2-3, specify any valid filename. For example, type:} c:\123\demo1 The system then examines the records in the data base for ones that meet the selection criteria we specified, then copies off the data fields that we selected. This selection was very simple, but some data extractions can take quite a long time. {When the "completed" message appears, press} {to return to the main menu.} At this point, we could look at the file again, perhaps changing the items displayed, selecting a different set of records, or modifying formats. Instead, let's exit the program and see what we accomplished with the file we just created. {Type:} Q {to quit.} 6 LOOKING AT AN EXTRACTED FILE {[This step is optional if you have Lotus 1-2-3 and can view the file in Lotus. If you have available the shareware program LIST or a text editor, you may prefer to use that program rather than TYPE in the following command.]} {Type:} TYPE C:\123\DEMO1.PRN The data have scrolled by quickly, but what is left on the screen shows us what we mean by "comma-delimited format": Alphabetic variables and codes are enclosed in quotes. Numbers have no leading zeros or spaces. Commas have been inserted between each field. Because we told EXTRACT to give us headers and footers, the file includes a complete list and description of each of the variables (columns) at the bottom of the file. We can see how useful that format is by loading the data into Lotus 1-2-3. 7 LOAD EXTRACTED DATA INTO LOTUS 1-2-3 1. Start Lotus 1-2-3 in the way you normally do to bring up a blank spreadsheet, for example, type: 123 2. {To import a comma-delimited file, type: /F(ile) I(mport) N(umbers). Then we select our file from the menu presented.} {Cursor to highlight DEMO1, and press} . {If DEMO1 is not one of the options on the menu, change the default directory with /F(ile) D(irectory) to the one where DEMO1 is located.} This easy procedure loads our extracted data into spreadsheet form. Note that this procedure has preserved not only column headings, but also the master heading we specified in EXTRACT. 3. You may want to adjust the column width for the county name by cursoring to that column and entering: /W(orksheet) W(idth) B [for column B] 15. 4. At the bottom of the display, ({press} ), more complete descriptions are associated with the crytic column headings. 5. Let's say that we want to rank order these data on sales volume. {Press} {to return to the top, then cursor down to cell A4.} {Type:} /D(ata) S(ort) D(ata-Range) .(period) {Type:} P(rimary sort key) F1 (for Sales) D(escending sort) {Type:} G(o) Now we have sorted counties in decreasing order of sales volume. 6. Of course, you will have lots of other things you want to do inside a spreadsheet program, like-- a. calculate percentages b. add up subtotals c. do bar charts or graphs 7. {To leave the program, type:} /Q(uit) Y(es) 8 INSTALLING EXTRACT Installation takes two paths depending on whether the program is built into your CD-ROM, or whether you obtained the program on diskette or by way of a bulletin board. 1. Installation from CD-ROM [Economic Census disc 1d includes the latest version of the program as of November 1991.] a. First determine the drive letter of the CD-ROM (in this example, we use L: b. Next decide where you want the EXTRACT software (we assume C:\EXTRACT c. Then run the installation program on the CD by typing: L:\EXTRACT\INSTALL L: C:\EXTRACT 2. Installation from floppy disk a. Create a new directory for the EXTRACT software and make it the default drive MD \EXTRACT CD \EXTRACT b. Uncompress the program, files, and documentation onto your hard disk (assuming floppy with EXTRACT is is A: drive) A:EXTRCT13 If someone has installed an earlier version of the program in this directory, you may be asked if it is "OK" to over-write EXTRACT.EXE and other files. Continue by typing Y(es) to each prompt. c. If you are using one of the CD-ROMs that does not have auxiliary files built in, you will need to follow instructions on a file including the letters READ.ME on your diskette (e.g., FTD_READ.ME). 3. You may print out the complete documentation by typing PRINT EXTRACT.DOC [or PRINT L:\EXTRACT\EXTRACT.DOC if the program is on your CD-ROM and its designation is L:]. You may be able to make your way through the EXTRACT program without ever referring to the documentation, but even experienced users will find that the documentation supplies lots of useful information, particularly about capabilities you may not have used yet, and about search strategies that help you get the most out of particular CD-ROMs. 9