U.S. Bureau of the Census Economic Census Staff Washington, D.C. 20233 EXTRACT TUTORIAL #2 (EXTutor 2) Adding labels, selecting records, November 7, 1991 refining the screen display ===================================================================== CONTACT: Bob Marske or (301) 763-1792 Paul Zeisset ABSTRACT: {The following was designed as the script for a training presentation. It may also be used as a self-directed tutorial in EXTRACT usage.} {This tutorial covers adding labels, selecting records in more sophisticated ways, and refining the screen display. It requires Economic Census CD-ROM 1C, 1D or 1E and EXTRACT 1.3 or later.} TIME REQUIRED: 30 minutes to 1 hour ===================================================================== [Note: while the following is set up to use a SC87A1 geographic area series file on 1987 Economic Censuses CD-ROM Volume 1, it is easily adaptable for use with the Volume 2 ZIP code disk, for example, with a SC87Z1 file.] In Tutorial #1, we did a very straightforward selection of items and records in a simple file. Let's see what we can do with files that present more complex options. STARTING THE PROGRAM 1. We will start EXTRACT using the parameters we saved in the last session. {Type:} EXECON , {then press} {again after viewing the opening screen.} 2. {If you did not save drive designations during Tutorial #1, the system will prompt you for them here.} 3. {On the Select a Catalog screen,} {and cursor to the SC87A1__ catalog, and select it by pressing} . 4. The help screen now describes this as a different kind of file, one in which each kind of business is on a separate record--like lines going down a table. 1 5. {On the Select a File menu, select a state and press} . 6. {From the main menu, type:} 1 {to Select Items.} {Put an X next to SVC87KB,ESTAB, RCPT, and EMPLOYE.} {Press} . 4. {At the main menu, type:} 6 {to Display to Screen.} On this display each line is identified by a kind-of-business code, which is not very descriptive. Therefore we are going to have to go back to the main menu and add the labels from another file. {Press} . ADDING LABELS FOR KIND-OF-BUSINESS 1. {At the main menu, type:} 3 {to Add Labels.} This menu shows each of the code variables that have alphabetic labels associated with it. We have already selected a particular county, so we can skip by the geographic labels and select a label for the kind of business. {Cursor to SVC87KB, and press} . 2. Some variables, like this one, give us more than one label to choose from. Longer labels are more descriptive, while short ones leave space for more data. Let's pick the 40-character label. {Cursor to TEXT40 and press} . 3. {At the main menu, type:} 6 {to redisplay the data.} This looks more like the kind of result we had in mind, more or less like a printed table. 2 SELECTING RECORDS FOR A COUNTY 1. {Type:} 2 {to Select Records.} Not all selection variables are equally efficient in finding data. Let's say that we want to find data for [city], and we see that there are two kinds of place code available--a census place code PLACE, and a FIPS place code PLACEFIP. Notice, however, that PLACE has an asterisk by it, while PLACEFIP does not. The "*" denotes those variables for which indexes are available, and that is directly related to how fast we can accomplish the searching process. Let's pick data for one county, using the COUNTY variable. {Cursor down to COUNTY, type} S, {and press} {to go to the next screen.} [If you are using Economic Census disc 1C, do not select on PLACE unless you have installed CD1CSUP2.] 2. {In Select Records Screen 2, put an X next to the particular county of interest.} 3. SPEED-UP SCREEN Here is something you may not have expected. The system tells us that it can speed up retrieval if we pick one of the categories of RECTYPE. We select data for a county. {Cursor to 06 County, and type:} X. You may be puzzled as to why the system asked that extra question, particularly when the answer seems obvious. In fact, county codes are attached to place records as well as county records, so your answer does make a difference. EXTRACT makes use of existing "indexes" on the CD-ROM whenever it can to speed the process of finding specific data. In this case it found that in order to use a particular index, it had to know both the county and the record type, so it prompted us for the missing piece of information. If you happen to select a RECTYPE that is not valid in combination with the rest of your selection criteria (e.g., if you specified a RECTYPE for State, when you had already specified a particular county code), then the program will not find any of the data you want. EXTRACT uses the information supplied "to speed up retrieval" only for the purpose of finding the first eligible record. Thus, the index is "turned off" as soon as the first eligible record is found, and data are presented in the original sequence of the file. 3 {Press} . If you have picked a small county, the system may display "No records found" because the SC87A1 files include only counties with 350 or more service establishments. In this case you will need to repeat the record-selection procedure with a larger county. 4. {At the main menu, type} 6 {to redisplay the data.} This shows us the data for the county selected. Note that its name has been introduced as a subheading. 5. You remember that we looked at definitions before in the Select Items screen. Definitions are also accessible here for many, but not all, files. {Cursor to the EMPLOYE column and press} D. {After viewing the definition, press} . Definitions are also associated with some labels, as in the case of this kind-of-business definition. {Cursor to the left-hand text column and highlight a particular kind-of-business category you want defined and press} D. {If a message at the bottom says no definition available, move to a higher level item and try again.} {Press} to return to the data display, and again to return to the main menu. 4 RESELECTING RECORDS TO LOOK AT A SINGLE KIND OF BUSINESS ACROSS COUNTIES So far we have been dealing with information much like we would find it in a printed report. But let's look at the data another way. Let's look at the distribution of computer-related services [or substitute another kind of business] among all areas in the state. 1. {Press} 2 {to Select Records.} 2. To look at records for one kind of business, we need to select on SVC87KB. {Cursor to SVC87KB, type:} S, {then press} . 3. The system now presents us a list of all possible kind of business codes, from which we want to select computer-related services. The code [for SVC87KB] we are interested in does not display on the screen, so we could can page down until we find it, as before. At the same time, EXTRACT offers some shortcuts to help you find codes faster. One way we can find out about these shortcuts is to press the help key or . {Press} . 4. HELP SCREEN EXTRACT's help system is context sensitive, that is, it gives you help specific to where you are in the program. At the bottom of this help screen are listed several "special options". D is for efinition, but we don't need that right now. J is to ump to a specific code, if we knew it in advance, but we don't so that doesn't apply. L is for ocate, and it finds the a particular character string at the beginning of the label. Since these labels feature SIC codes at the beginning, we could use this feature to Locate a particular SIC code. This feature would have been quite handy earlier when we were looking for a particular county. We could simply have typed in L and then the name of the county. R is for ange, which we generally use either when we want to specify a range of codes or, perhaps, numeric values (e.g., records where sales exceed $10 million). W is for ord search, and this should be just what we need. 5 {Press} {to return to the select-records menu.} 5. {Type:} W, {and enter} computer {in the word-search window at the bottom of the screen.} It takes a minute or so for the system to search through all of the labels for the word "computer", but it eventually comes up with a list of all of the SICs in the 737 group. Put an X next to code 185. Entries containing the word "computer" all happened to be together. Just for fun, let's also do a word search on "rental". {Type:} W {then} rental {and press} {until the last two characters of "computer" are erased, then} . As you can see, this Word Search feature can be quite powerful. Still, its success depends on how items are worded. If we had entered "leasing", you can see that Word Search would have missed code 174 (where leasing is abbreviated "leas", and several other codes where the word "leasing" was not used at all (175, 176, 178, and 195) even though it is implied. As long as we don't check anything here, our previous selection of "Computer and data processing services" remains intact. {Press} {to return to the main menu.} 6. {Type:} 6 {to display to the screen.} Displaying the data to the screen is not very useful at this point. The kind-of-business labels we have added no longer distinguish what data we are dealing with. We need names or codes for geographic areas instead. {Press} . 7. {Type:} 1 {to select items again.} {Add X's by MSA, COUNTY, and RECTYPE, then press} . 8. {Type:} 3 {to add labels.} The system now warns you that unless you press S for ave you will no longer have the SVC87KB labels. That is fine, since we are restricting ourselves to one kind of business, so we do not type S. {Cursor to COUNTY and press} . 6 9. At the bottom of the second labelling screen, we can see that there is an option to show additional fields, and another to allow multiple labels. Let's exercise both. {Type:} A {to show all fields, then} M {to allow multiple labels.} With the longer list, we now have to search farther to find the 20-character version of the county name. {Cursor down to TEXT20. Press} . Now a "greater-than" sign > appears at the left of TEXT20, and the system lets us pick another label. We can cursor down to POP87, and add 1987 population figures to our display of service industry data. {Cursor down to POP87, and press} . {Press} {to return to the main menu.} 10. {Type:} 6 {to display the data.} This gives us the distribution of computer data processing services within the state--at least in those areas for which detailed kinds of business are shown. Note that there are lines without a county name that have a RECTYPE of 04--these are records for metropolitan areas. (In fact, the text may say "United States" since that is the label associated with zeroes for state and county code.) We have to go to the end of that list before the counties start. (In a state without many large counties, you may also see records with RECTYPE of 07 for places.) Note that 1987 population has been added to the display by way of the labelling feature. There are a few data items in addition to labels in some of the label files. To see them we used the ll fields option on the second Add Labels screen. 11. ADDING A SECOND SET OF LABELS [Optional] [At this point you could also add MSA labels, TEXT10 suggested, to the display. You have to type S first to save the county labels before proceeding. After selecting 6 at the main menu, you will see that data display is MUCH slower when two sets of labels are used at the same time.] 7 EXTRACTING DATA TO A .DBF FILE This set of records might just be a group we want to refer back to, so let's save that to a file that EXTRACT can continue to use. 1. {Type:} 8 {to Extract the data to a file.} 2. {Type:} 1 {to select the DBF file option.} {Enter a file name, including drive and directory, but not any extension, e.g.,} c:\work\compsvcs. The system prompts us for a description for our "My_Files" catalog. You can take the one listed, you can edit it by cursoring to something you want to change, or you can retype the line. {Type:} Computer data processing services in [state] [date] . The system will work for a while, first extracting the appropriate records, and then in a second pass adding the appropriate labels. If you have added two sets of labels, this step can take quite a while. 3. When the system is finished, it offers you the choice of returning to the original file or of going directly to the newly created file. Normally we would take this shortcut to the new file. But first let's see another technique for retrieving the file we have just created. {Type} 1 {to return to the main menu.} 4. We can locate the .DBF files we create through the file selection menu. {Type:} 9 {to Return to the File Selection Menu.} 5. {Press} {until you see the MY_FILES catalog, and press} {to select it.} Note that the description we put in is available to help us select among a series of files. {Cursor to the desired file, if it is not already highlighted, and press} . 8 6. {Type:} 6 {to display to the screen.} At this point the display looks just like the one we had before, except that this time there was no waiting while it built up the screen. That is because it is not having to slow down as it filters out records that do not qualify, and because the system is working off the hard disc (a fast device) rather than off a CD-ROM (a slow device). {Press} . 7. {Type:} 1 {to bring up the Select Items screen.} The system has created a customized data dictionary for our new database file. Here we can see that there are fewer items to choose from. {Press} {to leave the item list unchanged and return to the main menu.} 8. If we want to add another label to the file, we could do so with option 3. If we wanted to cut of some of the records, for instance eliminating all but the county records, we could do so with option 2. But if we decided at this point that we wanted more data rather than less, such as one of the flags we saw in the original file, we would have to reselect the original file, and repeat most of our original steps. 9. {If you are finished, type:} Q {to quit.} 9 PRACTICE PROBLEMS 1. What are the three most important merchandise lines, other than groceries, sold by food stores (SIC 54) in the Albany-Schenectady- Troy, NY MSA? Hints: use the RC87L___ catalog and the RC87L1MM file; select items to include RTL87ML, ESTAB, SALES, and SALPALL; add labels for RTL87ML; and select records on MSA and RTL87KB. You will need to know that the Albany metropolitan area does not include any primary metropolitan statistical areas. If you want to carry this further, compare your list of top selling merchandise lines in this metro area in 1987 to corresponding figures for 1982 and 1977 (see data under a separate catalog) or to nationwide figures in RC87L1XS. 2. What kinds of retail business are present in your own ZIP code, or one familiar to you? Hints: Using a CD-ROM in the Economic Censuses Volume 2 series, specify the RC87Z1__ catalog and your state; select items to include ZIP, ESTAB, ESALE1, ESALE100, ESALE250, ESALE500, and ESALE1M; add labels for RTL87KB; and select records on ZIP, specifying your ZIP code as both the minimum and maximum value of the range. 10