# NSRL Magic File # Based on magic data for file(1) command. # Format is described in magic(5). # # JT, 2004-03-17: Removed all untagged lines to speed processing and changed # tag format to exclude whitespace (e.g. "[NSRL|TAR]"). # JT, 2004-03-10: Added tags for NSRL recognition of supported archive types # (e.g. "[NSRL|TAR]"). #------------------------------------------------------------------------------ # Magic # Magic data for file(1) command. # Machine-generated from src/cmd/file/magdir/*; edit there only! # Format is described in magic(files), where: # files is 5 on V7 and BSD, 4 on SV, and ?? in the SVID. #------------------------------------------------------------------------------ # Localstuff: file(1) magic for locally observed files # # $Id: Localstuff,v 1.3 1995/01/21 21:09:00 christos Exp $ # Add any locally observed files here. Remember: # text if readable, executable if runnable binary, data if unreadable. #------------------------------------------------------------------------------ #------------------------------------------------------------------------------ # archive: file(1) magic for archive formats (see also "msdos" for self- # extracting compressed archives) # # cpio, ar, arc, arj, hpack, lha/lharc, rar, squish, uc2, zip, zoo, etc. # pre-POSIX "tar" archives are handled in the C code. # CDROM Filesystems 32769 string CD001 ISO 9660 CD-ROM filesystem data [NSRL|ISO] # "application id" which appears to be used as a volume label >32808 string >\0 '%s' >34816 string \000CD001\001EL\ TORITO\ SPECIFICATION (bootable) 37633 string CD001 ISO 9660 CD-ROM filesystem data (raw 2352 byte sectors) [NSRL|ISO] 32776 string CDROM High Sierra CD-ROM filesystem data [NSRL|ISO] # POSIX tar archives 257 string ustar\0 POSIX tar archive [NSRL|TAR] 257 string ustar\040\040\0 GNU tar archive [NSRL|TAR] # cpio archives # # Yes, the top two "cpio archive" formats *are* supposed to just be "short". # The idea is to indicate archives produced on machines with the same # byte order as the machine running "file" with "cpio archive", and # to indicate archives produced on machines with the opposite byte order # from the machine running "file" with "byte-swapped cpio archive". # # The SVR4 "cpio(4)" hints that there are additional formats, but they # are defined as "short"s; I think all the new formats are # character-header formats and thus are strings, not numbers. 0 short 070707 cpio archive [NSRL|CPIO] 0 short 0143561 byte-swapped cpio archive [NSRL|CPIO] 0 string 070707 ASCII cpio archive (pre-SVR4 or odc) [NSRL|CPIO] 0 string 070701 ASCII cpio archive (SVR4 with no CRC) [NSRL|CPIO] 0 string 070702 ASCII cpio archive (SVR4 with CRC) [NSRL|CPIO] # Debian package (needs to go before regular portable archives) # 0 string !\ndebian Debian package [NSRL|DEB] # other archives 0 long 0177555 very old archive [NSRL|UNK] 0 short 0177555 very old PDP-11 archive [NSRL|UNK] 0 long 0177545 old archive [NSRL|UNK] 0 short 0177545 old PDP-11 archive [NSRL|UNK] 0 long 0100554 apl workspace archive [NSRL|UNK] 0 string = archive [NSRL|UNK] # MIPS archive (needs to go before regular portable archives) # 0 string !\n__________E MIPS archive [NSRL|MIPS] 0 string -h- Software Tools format archive text [NSRL|SFT] # # XXX - did Aegis really store shared libraries, breakpointed modules, # and absolute code program modules in the same format as new-style # "ar" archives? # 0 string ! current ar archive [NSRL|AR] 0 string \ System V Release 1 ar archive [NSRL|AR] 0 string = archive [NSRL|AR] # # XXX - from "vax", which appears to collect a bunch of byte-swapped # thingies, to help you recognize VAX files on big-endian machines; # with "leshort", "lelong", and "string", that's no longer necessary.... # 0 belong 0x65ff0000 VAX 3.0 archive [NSRL|VAX3] 0 belong 0x3c61723e VAX 5.0 archive [NSRL|VAX5] # 0 long 0x213c6172 archive file [NSRL|UNK] 0 lelong 0177555 very old VAX archive [NSRL|UNK] 0 leshort 0177555 very old PDP-11 archive [NSRL|UNK] # # XXX - "pdp" claims that 0177545 can have an __.SYMDEF member and thus # be a random library (it said 0xff65 rather than 0177545). # 0 lelong 0177545 old VAX archive [NSRL|UNK] 0 leshort 0177545 old PDP-11 archive [NSRL|UNK] # # From "pdp" (but why a 4-byte quantity?) # 0 lelong 0x39bed PDP-11 old archive [NSRL|UNK] 0 lelong 0x39bee PDP-11 4.0 archive [NSRL|UNK] # ARC archiver, from Daniel Quinlan (quinlan@yggdrasil.com) # # The first byte is the magic (0x1a), byte 2 is the compression type for # the first file (0x01 through 0x09), and bytes 3 to 15 are the MS-DOS # filename of the first file (null terminated). Since some types collide # we only test some types on basis of frequency: 0x08 (83%), 0x09 (5%), # 0x02 (5%), 0x03 (3%), 0x04 (2%), 0x06 (2%). 0x01 collides with terminfo. 0 lelong&0x8080ffff 0x0000081a ARC archive data, dynamic LZW [NSRL|ARC-LZW] 0 lelong&0x8080ffff 0x0000091a ARC archive data, squashed [NSRL|ARC-SQAS] 0 lelong&0x8080ffff 0x0000021a ARC archive data, uncompressed [NSRL|ARC] 0 lelong&0x8080ffff 0x0000031a ARC archive data, packed [NSRL|ARC-PACK] 0 lelong&0x8080ffff 0x0000041a ARC archive data, squeezed [NSRL|ARC-SQZ] 0 lelong&0x8080ffff 0x0000061a ARC archive data, crunched [NSRL|ARC-CRNC] # Acorn archive formats (Disaster prone simpleton, m91dps@ecs.ox.ac.uk) # I can't create either SPARK or ArcFS archives so I have not tested this stuff # [GRR: the original entries collide with ARC, above; replaced with combined # version (not tested)] #0 byte 0x1a RISC OS archive #>1 string archive (ArcFS format) 0 string \032archive RISC OS archive (ArcFS format) [NSRL|ROS] # ARJ archiver (jason@jarthur.Claremont.EDU) 0 leshort 0xea60 ARJ archive data [NSRL|ARJ] # HPACK archiver (Peter Gutmann, pgut1@cs.aukuni.ac.nz) 0 string HPAK HPACK archive data [NSRL|HPAK] # JAM Archive volume format, by Dmitry.Kohmanyuk@UA.net 0 string \351,\001JAM\ JAM archive, [NSRL|JAM] # LHARC/LHA archiver (Greg Roelofs, newt@uchicago.edu) 2 string -lh0- LHarc 1.x archive data [lh0] [NSRL|LHA1] 2 string -lh1- LHarc 1.x archive data [lh1] [NSRL|LHA1] 2 string -lz4- LHarc 1.x archive data [lz4] [NSRL|LHA1] 2 string -lz5- LHarc 1.x archive data [lz5] [NSRL|LHA1] # [never seen any but the last; -lh4- reported in comp.compression:] 2 string -lzs- LHa 2.x? archive data [lzs] [NSRL|LHA2] 2 string -lh\40- LHa 2.x? archive data [lh ] [NSRL|LHA2] 2 string -lhd- LHa 2.x? archive data [lhd] [NSRL|LHA2] 2 string -lh2- LHa 2.x? archive data [lh2] [NSRL|LHA2] 2 string -lh3- LHa 2.x? archive data [lh3] [NSRL|LHA2] 2 string -lh4- LHa (2.x) archive data [lh4] [NSRL|LHA2] 2 string -lh5- LHa (2.x) archive data [lh5] [NSRL|LHA2] 2 string -lh6- LHa (2.x) archive data [lh6] [NSRL|LHA2] 2 string -lh7- LHa (2.x) archive data [lh7] [NSRL|LHA2] # RAR archiver (Greg Roelofs, newt@uchicago.edu) 0 string Rar! RAR archive data [NSRL|RAR] # SQUISH archiver (Greg Roelofs, newt@uchicago.edu) 0 string SQSH squished archive data (Acorn RISCOS) [NSRL|SQSH] # UC2 archiver (Greg Roelofs, newt@uchicago.edu) # I can't figure out the self-extracting form of these buggers... 0 string UC2\x1a UC2 archive data [NSRL|UC2] # ZIP archives (Greg Roelofs, c/o zip-bugs@wkuvx1.wku.edu) 0 string PK\003\004 Zip archive data [NSRL|ZIP] # JT, 2004-03-17: Is it really necess. to include this version-specific info? >4 byte 0x09 \b, at least v0.9 to extract >4 byte 0x0a \b, at least v1.0 to extract >4 byte 0x0b \b, at least v1.1 to extract >4 byte 0x14 \b, at least v2.0 to extract # Alternate ZIP string (amc@arwen.cs.berkeley.edu) 0 string PK00PK\003\004 Zip archive data [NSRL|ZIP] # Zoo archiver 20 lelong 0xfdc4a7dc Zoo archive data [NSRL|ZOO] # Shell archives # JT, 2004-03-17: The "#" in the middle of the line makes it look like the # rest of the line is commented out - I guess it's OK? Beware if parsing! 10 string #\ This\ is\ a\ shell\ archive shell archive text [NSRL|SH] # # LBR. NB: May conflict with the questionable # "binary Computer Graphics Metafile" format. # 0 string \0\ \ \ \ \ \ \ \ \ \ \ \0\0 LBR archive data [NSRL|LBR] # # PMA (CP/M derivative of LHA) # 2 string -pm0- PMarc archive data [pm0] [NSRL|PMA] 2 string -pm1- PMarc archive data [pm1] [NSRL|PMA] 2 string -pm2- PMarc archive data [pm2] [NSRL|PMA] 2 string -pms- PMarc SFX archive (CP/M, DOS) [NSRL|PMA-SFX] 5 string -pc1- PopCom compressed executable archive (CP/M) [NSRL|PPC-SFX] # From rafael@icp.inpg.fr (Rafael Laboissiere) # The Project Revision Control System (see # http://www.XCF.Berkeley.EDU/~jmacd/prcs.html) generates a packaged project # file which is recognized by the following entry: 0 leshort 0xeb81 PRCS packaged project archive [NSRL|PRCS] # Microsoft cabinets # by David Necas (Yeti) 0 string MSCF\0\0\0\0 Microsoft cabinet file data archive, [NSRL|CAB] # GTKtalog catalogs # by David Necas (Yeti) 4 string gtktalog\ GTKtalog catalog data archive, [NSRL|GTK] >>14 beshort 0x677a (gzipped) #------------------------------------------------------------------------------ # compress: file(1) magic for pure-compression formats (no archives) # # compress, gzip, pack, compact, huf, squeeze, crunch, freeze, yabba, etc. # # Formats for various forms of compressed data # Formats for "compress" proper have been moved into "compress.c", # because it tries to uncompress it to figure out what's inside. # standard unix compress 0 string \037\235 compress'd data archive [NSRL|Z] # gzip (GNU zip, not to be confused with Info-ZIP or PKWARE zip archiver) 0 string \037\213 gzip compressed data archive [NSRL|GZ] >3 byte &0x20 encrypted, # packed data, Huffman (minimum redundancy) codes on a byte-by-byte basis 0 string \037\036 packed data archive [NSRL|PACK] # # This magic number is byte-order-independent. XXX - Does that mean this # is big-endian, little-endian, either, or that you can't tell? # this short is valid for SunOS 0 short 017437 old packed data archive [NSRL|PACK] # XXX - why *two* entries for "compacted data", one of which is # byte-order independent, and one of which is byte-order dependent? # 0 short 0x1fff compacted data archive [NSRL|CPCT] # This string is valid for SunOS (BE) and a matching "short" is listed # in the Ultrix (LE) magic file. 0 string \377\037 compacted data archive [NSRL|CPCT] 0 short 0145405 huf output archive [NSRL|CPCT] # bzip2 0 string BZh bzip2 compressed data archive [NSRL|BZ2] # squeeze and crunch # Michael Haardt 0 beshort 0x76FF squeezed data archive, [NSRL|SQZ] 0 beshort 0x76FE crunched data archive, [NSRL|CRNC] 0 beshort 0x76FD LZH compressed data archive, [NSRL|LZH] # Freeze 0 string \037\237 frozen file 2.1 archive [NSRL|FRZ] 0 string \037\236 frozen file 1.0 archive (or gzip 0.5) [NSRL|FRZ?GZ] # SCO compress -H (LZH) 0 string \037\240 SCO compress -H (LZH) data archive [NSRL|SCO] # bzip a block-sorting file compressor # by Julian Seward and others # 0 string BZ bzip compressed data archive [NSRL|BZ] # lzop from 0 string \x89\x4c\x5a\x4f\x00\x0d\x0a\x1a\x0a lzop compressed data archive [NSRL|LZOP] # PHIGS 0 string ARF_BEGARF PHIGS clear text archive [NSRL|PHG] 0 string @(#)SunPHIGS SunPHIGS [NSRL|SPHG] #------------------------------------------------------------------------------ # macintosh description # # BinHex is the Macintosh ASCII-encoded file format (see also "apple") # Daniel Quinlan, quinlan@yggdrasil.com 11 string must\ be\ converted\ with\ BinHex BinHex binary text [NSRL|BINH] # Stuffit archives are the de facto standard of compression for Macintosh # files obtained from most archives. (franklsm@tuns.ca) 0 string SIT! StuffIt Archive (data) [NSRL|SIT] 0 string SITD StuffIt Deluxe archive (data) [NSRL|SITD] 0 string Seg StuffIt Deluxe Segment archive (data) [NSRL|SITD] # Additional Macintosh Files (franklsm@tuns.ca) 0 string PACT Macintosh Compact Pro Archive (data) [NSRL|CPA] # MacBinary format (Eric Fischer, enf@pobox.com) # # Unfortunately MacBinary doesn't really have a magic number prior # to the MacBinary III format. The checksum is really the way to # do it, but the magic file format isn't up to the challenge. # # 0 byte 0 # 1 byte # filename length # 2 string # filename # 65 string # file type # 69 string # file creator # 73 byte # Finder flags # 74 byte 0 # 75 beshort # vertical posn in window # 77 beshort # horiz posn in window # 79 beshort # window or folder ID # 81 byte # protected? # 82 byte 0 # 83 belong # length of data segment # 87 belong # length of resource segment # 91 belong # file creation date # 95 belong # file modification date # 99 beshort # length of comment after resource # 101 byte # new Finder flags # 102 string mBIN # (only in MacBinary III) # 106 byte # char. code of file name # 107 byte # still more Finder flags # 116 belong # total file length # 120 beshort # length of add'l header # 122 byte 129 # for MacBinary II # 122 byte 130 # for MacBinary III # 123 byte 129 # minimum version that can read fmt # 124 beshort # checksum # # This attempts to use the version numbers as a magic number, requiring # that the first one be 0x80, 0x81, 0x82, or 0x83, and that the second # be 0x81. This works for the files I have, but maybe not for everyone's. 122 beshort&0xFCFF 0x8081 Macintosh MacBinary data [NSRL|MBIN] # MacBinary I doesn't have the version number field at all, but MacBinary II # has been in use since 1987 so I hope there aren't many really old files # floating around that this will miss. The original spec calls for using # the nulls in 0, 74, and 82 as the magic number. # # Another possibility, that would also work for MacBinary I, is to use # the assumption that 65-72 will all be ASCII (0x20-0x7F), that 73 will # have bits 1 (changed), 2 (busy), 3 (bozo), and 6 (invisible) unset, # and that 74 will be 0. So something like # # 71 belong&0x80804EFF 0x00000000 Macintosh MacBinary data # # >73 byte&0x01 0x01 \b, inited # >73 byte&0x02 0x02 \b, changed # >73 byte&0x04 0x04 \b, busy # >73 byte&0x08 0x08 \b, bozo # >73 byte&0x10 0x10 \b, system # >73 byte&0x10 0x20 \b, bundle # >73 byte&0x10 0x40 \b, invisible # >73 byte&0x10 0x80 \b, locked >65 string Gzip (GNU gzip) [NSRL|GZ] >65 string PACT (Compact Pro archive) [NSRL|CPA] >65 string SITD (StuffIt Deluxe archive) [NSRL|SITD] >65 string Seg\ (StuffIt segment archive) [NSRL|SIT] >65 string TARF (Unix tar archive) [NSRL|TAR] >65 string ZIVM (compress (.Z) archive) [NSRL|Z] # Somewhere, Apple has a repository of registered Creator IDs. These are # just the ones that I happened to have files from and was able to identify. >69 string EXTR (self-extracting archive) [NSRL|MAC-SFX] >69 string Gzip (GNU gzip archive) [NSRL|GZ] >69 string LZIV (compress archive) [NSRL|Z] >69 string SIT! (StuffIt archive) [NSRL|SIT] # Just in case... 102 string mBIN MacBinary III data with surprising version number [NSRL|MBIN] #------------------------------------------------------------------------------ # msdos: file(1) magic for MS-DOS files # # XXX - according to Microsoft's spec, at an offset of 0x3c in a # PE-format executable is the offset in the file of the PE header; # unfortunately, that's a little-endian offset, and there's no way # to specify an indirect offset with a specified byte order. # So, for now, we assume the standard MS-DOS stub, which puts the # PE header at 0x80 = 128. # # Required OS version and subsystem version were 4.0 on some NT 3.51 # executables built with Visual C++ 4.0, so it's not clear that # they're interesting. The user version was 0.0, but there's # probably some linker directive to set it. The linker version was # 3.0, except for one ".exe" which had it as 4.20 (same damn linker!). # # .EXE formats (Greg Roelofs, newt@uchicago.edu) # 0 string MZ MS-DOS executable (EXE) >>0xe7 string LH/2\ Self-Extract \b, %s [NSRL|WIN-SFX] >>0xe9 string PKSFX2 \b, %s [NSRL|WIN-SFX] >>122 string Windows\ self-extracting\ ZIP \b, %s [NSRL|WIN-SFX] >0x1c string RJSX\xff\xff \b, ARJ SFX archive [NSRL|WIN-SFX] >0x1c string diet\xf9\x9c \b, diet compressed archive [NSRL|WIN-SFX] >0x1e string Copyright\ 1989-1990\ PKWARE\ Inc. \b, PKSFX archive [NSRL|WIN-SFX] # JM: 0x1e "PKLITE Copr. 1990-92 PKWARE Inc. All Rights Reserved\7\0\0\0" >0x1e string PKLITE\ Copr. \b, %.6s compressed archive [NSRL|WIN-SFX] >0x24 string LHa's\ SFX \b, %.15s archive [NSRL|WIN-SFX] >0x24 string LHA's\ SFX \b, %.15s archive [NSRL|WIN-SFX] >608 string _winzip_ \b, sfx winzip archive [NSRL|WIN-SFX] >1638 string -lh5- \b, LHa SFX archive v2.13S [NSRL|WIN-SFX] >7195 string Rar! \b, RAR self-extracting archive [NSRL|WIN-SFX] # # [GRR 950118: file 3.15 has a buffer-size limitation; offsets bigger than # 8161 bytes are ignored. To make the following entries work, increase # HOWMANY in file.h to 32K at least, and maybe to 70K or more for OS/2, # NT/Win32 and VMS.] # [GRR: some company sells a self-extractor/displayer for image data(!)] # >11696 string PK\003\004 \b, PKZIP SFX archive v1.1 [NSRL|WIN-SFX] >13297 string PK\003\004 \b, PKZIP SFX archive v1.93a [NSRL|WIN-SFX] >15588 string PK\003\004 \b, PKZIP2 SFX archive v1.09 [NSRL|WIN-SFX] >15770 string PK\003\004 \b, PKZIP SFX archive v2.04g [NSRL|WIN-SFX] >28374 string PK\003\004 \b, PKZIP2 SFX archive v1.02 [NSRL|WIN-SFX] # # Info-ZIP self-extractors # these are the DOS versions: >25115 string PK\003\004 \b, Info-ZIP SFX archive v5.12 [NSRL|WIN-SFX] >26331 string PK\003\004 \b, Info-ZIP SFX archive v5.12 w/decryption [NSRL|WIN-SFXD] # these are the OS/2 versions (OS/2 is flagged above): >47031 string PK\003\004 \b, Info-ZIP SFX archive v5.12 [NSRL|OS2-SFX] >49845 string PK\003\004 \b, Info-ZIP SFX archive v5.12 w/decryption [NSRL|OS2-SFXD] # this is the NT/Win32 version: >69120 string PK\003\004 \b, Info-ZIP NT SFX archive v5.12 w/decryption [NSRL|WIN-SFXD] # # TELVOX Teleinformatica CODEC self-extractor for OS/2: >49801 string \x79\xff\x80\xff\x76\xff \b, CODEC archive v3.21 [NSRL|OS2-SFX] # Microsoft CAB distribution format Dale Worley 0 string MSCF\000\000\000\000 Microsoft CAB file archive [NSRL|CAB] # windows zips files .dmf 0 string MDIF\032\000\010\000\000\000\372\046\100\175\001\000\001\036\001\000 Ms-windows special zipped file archive [NSRL|DMF] #------------------------------------------------------------------------------ # # RPM: file(1) magic for Red Hat Packages Erik Troan (ewt@redhat.com) # 0 beshort 0xedab >2 beshort 0xeedb RPM [NSRL|RPM] #------------------------------------------------------------------------------ # sccs: file(1) magic for SCCS archives # # SCCS archive structure: # \001h01207 # \001s 00276/00000/00000 # \001d D 1.1 87/09/23 08:09:20 ian 1 0 # \001c date and time created 87/09/23 08:09:20 by ian # \001e # \001u # \001U # ... etc. # Now '\001h' happens to be the same as the 3B20's a.out magic number (0550). # *Sigh*. And these both came from various parts of the USG. # Maybe we should just switch everybody from SCCS to RCS! # Further, you can't just say '\001h0', because the five-digit number # is a checksum that could (presumably) have any leading digit, # and we don't have regular expression matching yet. # Hence the following official kludge: 8 string \001s\ SCCS archive data [NSRL|SCCS] #------------------------------------------------------------------------------ # uuencode: file(1) magic for ASCII-encoded files # # GRR: the first line of xxencoded files is identical to that in uuencoded # files, but the first character in most subsequent lines is 'h' instead of # 'M'. (xxencoding uses lowercase letters in place of most of uuencode's # punctuation and survives BITNET gateways better.) If regular expressions # were supported, this entry could possibly be split into two with # "begin\040\.\*\012M" or "begin\040\.\*\012h" (where \. and \* are REs). 0 string begin\040 uuencoded or xxencoded text archive [NSRL|UU] # btoa(1) is an alternative to uuencode that requires less space. 0 string xbtoa\ Begin btoa'd text archive [NSRL|BTOA] # ship(1) is another, much cooler alternative to uuencode. # Greg Roelofs, newt@uchicago.edu 0 string $\012ship ship'd binary text archive [NSRL|SHIP] # bencode(8) is used to encode compressed news batches (Bnews/Cnews only?) # Greg Roelofs, newt@uchicago.edu 0 string Decode\ the\ following\ with\ bdeco bencoded News text archive [NSRL|BENC] # BinHex is the Macintosh ASCII-encoded file format (see also "apple") # Daniel Quinlan, quinlan@yggdrasil.com 11 string must\ be\ converted\ with\ BinHex BinHex binary text archive [NSRL|BINH] # ----------------------------------------------------------- # VMware specific files (deducted from version 1.1 and log file entries) # Anthon van der Neut (anthon@mnt.org) 0 belong 0x4d52564e VMware nvram 0 belong 0x434f5744 >8 byte 3 VMware virtual disk [NSRL|VMDK] # InstallShield Cabinet files 0 string ISc( InstallShield Cabinet file [NSRL|INST] >5 byte&0xf0 =0x60 version 6, >5 byte&0xf0 !0x60 version 4/5, >(12.l+40) lelong x %u files