Formats

A Macintosh file can store data in two different ways. One of them is known as 'Resources' (descriptions of windows, icons, etc.) and the other ('Data') contains application data (text, graphics, etc.); so all Macintosh files consist of a Data Fork and an (optional) Resource Fork. From the computer point of view, it is as if every file was actually two files.

If we consider the need to store the Finder info too (Finder label, status flags, etc.) we have 3 parts for every file. Other operating systems don't support this (one could say their files consist only of a Data Fork), so every time you transfer a Macintosh file to a different OS you will lose the Finder Info and the Resource Fork for that file.

If you want to actually use the data in that OS, then all the valid information must be in the Data Fork: (example: a Macintosh JPEG image file can have Resources, for instance the preview and the finder custom icon, but these are irrelevant in a Unix machine; since you access the file in a different way)

This is what we call 'Binary transfer' (since gzip is a Unix compressor and gzip format holds compressed Unix data, to gzip a Macintosh file is analogous to transfer it to a Unix machine), so after the transfer you will get a file with exactly the contents of the Data Fork byte by byte.

This can be a problem in certain situations. Earlier we said text is in the Data Fork (usually), so you may think it's enough to transfer it in binary mode. Well, this is not true for a simple reason: In binary data, every OS knows how to read it; the formats are well defined and the software is adapted to deal with it. But for text, every OS defines 'what' is text and how to use it, so every OS uses a different convention to mark the end of a line (and don't talk about char sets...).

There are two bytes which are used for this task: CR (Carriage Return) and LF (Line Feed)

		DOS uses <CR><LF>
		Macintosh uses <CR>
		Unix (and others) uses <LF>

So, Unix will see a Macintosh text file as a unique (and very long) line with some strange characters inserted periodically.

The 'ASCII' transfer mode will change every <CR> by <LF> (the reverse when the target is a Macintosh text file).

CR and LF aren't reserved bytes, so every binary file can use them (and they do), so if you transfer a binary file in ASCII mode; the chances are you will get garbage...

On the other hand, you can alway convert a text file transmitted in binary mode to the local conventions after the transfer; but it's faster simply to transfer it right.

In most Unix machines there are utilities to make the translation of text files (unix2dos, dos2unix, fromdos, todos). For Macintosh text files you can use "tr '\015' '\012'" as 'frommac' and "tr '\012' '\015'" as 'tomac'. And for DOS and Unix you can use a public domain utility named 'charconv', too.

What about if you want to store a Macintosh file in a different machine but you don't want to lose any of the info (for instance, an application)?

Then you can use a format which joins the two forks and the finder info in a new file with only a Data Fork. There are several formats to do this (BinHex, MacBinary, etc); since BinHex uses only 7 bit codes (opposed to MacBinary, which uses all 8 bits) it is not a good choice for a compressor.

If once you have the MacBinary file in the non MacOS machine you decide you want to use the Data Fork, there are several ways to accomplish it.

You can get a macbinary translator. Usually this will give you two files (foo .data and foo.rsrc); you can install (in Unix) Columbia AppleTalk Package and use 'capit'; which will restore even Finder Info.

You can also use a program similar to this one:

/* Quick & (very) Dirty program to extract Data Fork from MacBinary */
#include <stdio.h>
#include <fcntl.h>

main(int argc, char **argv)
{
    int id, od;
    long i;
    char buf[128];

    id = open ( argv[1], O_RDONLY );

    read(id, buf, 128);
    buf[2+buf[1]]=0x00;

    /* Data Fork length. Avoid byte sex problems */
    i = buf[86] + 256 * ( buf[85] + 256 * ( buf [84] + 256 * buf[83]));

    od = open ( &buf[2], O_WRONLY | O_CREAT, 0666 );

    for ( ; i>0 ; i-=128)
        write( od, buf,  read ( id , buf, i < 128 ? i : 128 ));

    close(od);
    close(id);
}

MacGzip's MacBinary mode will give you a file with a .gz extension (just like binary and ASCII mode). It would be more correct to use .bin.gz; but this will make a suffix that's too long. Anyway, when the -N (save/restore original name) option is used, the saved name has the .bin suffix; so you know it is a MacBinary file (if you expand it using another program (not MacGzip) make sure you expand it in binary mode, and use -N for clarity).

MacGzip can select the correct mode automatically for you, but you need to tell it how; the best way is to use Internet Config. You can read more about IC and Macintosh files in 'Suffix Mapping'.

BTW, gzip files are binaries; ASCII transfer mode will mess them (usually you get a CRC error when you try to expand); and MacBinary will give you a file with 128 bytes of garbage at the start (you can skip them using 'dd' in Unix) and more garbage at the end (MacBinary rounds file sizes to 128 bytes chunks). Most of the times, if you skip the 128 bytes header at the start, gzip will be able to decompress the file.