Few Gedcom applications can handle embedded objects, and even less makes it correctly, why?
This short discussion tries to answer to this question.
Conversion of binary file into set of printable characters is wide spread, e.g. in email (Mime encoding) or XML. (http://en.wikipedia.org/wiki/Base64).
Gedcom specification belongs to this group of translations, based on radix-64 representation.
Here is transformation schema: 3 bytes (24 bits) are converted to 4 bytes. These 4 bytes has always two MSB set to 0 and the remaining 6 bits comes from the source bytes (4x6 = 24)
Bytes with 6 bits can have all values between 0 and 0x3f. The goal is to embed binary file as a text, so the next step is to translate value from 0 - 0x3f to same printable characters.
Mime (base64) and Gedcom uses different translate tables.
Base requirement is reversibility of this encoding process, i.e.
file == decode( encode( file ) );
for all files.
The number of bytes to encode may be divisible by 3 or not. Size modulo 3 - can be 0, 1 or 2. When the (size modulo 3) is 1, we need 2 bytes to encode the last byte, when the (size modulo 3) equals 2 - three bytes are needed. Logically no extra information needs, decoder can find out all possibilities (encoded file modulo 4 equals 0, 2 or 3).
Same implementation use padding, i.e. one use two or one extra characters to make encoded file divisible by four. It's essential that these padding bytes must be different from regular character set uses in the translation table of given implementation! The decoder must know if last bytes are regular encoded bytes or just padding bytes.
One case where padding characters are required is when multiple encoded files are concatenated, i.e. encode(file1) + encode(file2) = encode(file1 + file2); + means file concatenation.
Here can happen conceptual mishmash: when file to encode is not divisible by 3, only for regularity - for decoding algorithm which take three bytes - instruction can say: pad the last byte with 2 other bytes ... then encode three last bytes. This is of course not the same "padding" as we talk about first.
The encoding routine converts a binary multimedia file segment of from 1 to 54 bytes in length into an encoded GEDCOM line value of 2 to 64 bytes in length. This encoded value becomes the <ENCODED_MULTIMEDIA_LINE> used in the MULTIMEDIA_RECORD (see page 26.)
The algorithm accomplishes its goal using the following steps:
Decoding:
The Decoding routine converts the encoded line value back into the original binary character multimedia file segment.
The decoding algorithm can be accomplished in the following steps:
The quotation ends here.
Doubts:
Conclusions
I can find only one way to interpret GEDCOM standard in a way that works (as in p. 1 above), but the instruction as a whole is very ambiguous. Some of genealogical application decided to decode without padding - I know only one which give correct results, this is: "The Complete Genealogy Builder" (in fact there was zero apps for few days ago, The Complete Genealogy Builder had a little bug - corrected now).
Images in "jpg" format seams be insensitive to error in few last bytes, maybe therefore developers not see or not care about such errors, but this is not right way, other formats can has some checksums or other integrity control that will make decoded file unusable.