Index: > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Business Industries Finance Tax

Home > File format


First Prev [ 1 2 3 ] Next Last

A file format is a particular way to encode information for storage in a computer file.

Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for different kinds of information. However, within any format type e.g. word processor documents, there will typically be several different - and sometimes competing - formats.

1 Generality

Some file formats are designed to store very particular sorts of data: the JPEG format, for example, is designed only to store static images. Other file formats, however, are designed for storage of several different types of data: the GIF format supports storage of both still images and simple animations, and the AVI format can act as a container for many different types of multimedia. A text file is simply one that stores any text, in a format such as ASCII or Unicode, with few if any control characters. Some file formats, such as HTMLHyperText Markup Language (HTML) is a markup language designed for creating web pages, that is, information presented on the World Wide Web. Defined as a simple "application" of SGML, which is used by organizations with complex publishing requirements, HT, or the source codeSource code (commonly just source or code is any series of statements written in some human-readable computer programming language. In modern programming languages, the source code which constitutes a software program is usually in several text files, but of some particular programming language, are in fact also text files, but adhere to more specific rules which allow them to be used for specific purposes.

It is sometimes possible to cause a program to read a file encoded in one format as if it were encoded in another format. For example, either by making minor modifications to a Microsoft WordMicrosoft Word is a word processor program from Microsoft. It was originally written by Richard Brodie for IBM PC computers running DOS in 1983. Later versions were created for the Apple Macintosh ( 1984), SCO UNIX and Microsoft Windows ( 1989). It became document or by using a music-playing program that deals in "headerless" audio files, one can play a Microsoft WordMicrosoft Word is a word processor program from Microsoft. It was originally written by Richard Brodie for IBM PC computers running DOS in 1983. Later versions were created for the Apple Macintosh ( 1984), SCO UNIX and Microsoft Windows ( 1989). It became document as if it were a song. The result does not sound very musical, however. This is so because a sensible arrangement of bits in one format is almost always nonsensical in another.

It should be noted that it is very difficult to make a principled distinction between a file format and a programming languageAn alternate rewrite has been has been. Please refer to it for large rewrites. A programming language or computer language is a standardized communication technique for expressing instructions to a computer. It is a set of syntactic and semantic rules use, or between a "normal program" and a programming language interpreter. A programming language can be seen as a file format for storing algorithms, while even a simple image file viewer can be seen as an "interpreter" for, say, the GIF "language".

2 Specifications

Many file formats, including some of the most well-known file formats, have a published specificationIn engineering and manufacturing, the term specification has the following meanings: Technical requirement An essential technical requirement for items, materials, or services, including the procedures to be used to determine whether the requirement has b document (often with a reference implementationIn computing, a reference implementation (or, infrequently, sample implementation is a software example of a standard for use in helping others implement their own versions of the standard. A standard is much easier to understand with a working example to) that describes exactly how the data is to be encoded, and which can be used to determine whether or not a particular program treats a particular file format correctly. There are, however, two reasons why this is not always the case. First, some file format developers view their specification documents as trade secrets, and therefore do not release them to the public. A prominent example of this exists in several formats used by the Microsoft Office suite of applications. Second, some file format developers never spend time writing a separate specification document; rather, the format is defined only implicitly, through the program(s) that manipulate data in the format.

Note that using file formats without a publicly available specification can be costly. Learning how the format works will require either reverse-engineering it from a reference implementation or acquiring the specification document for a fee from the format developers. This second approach is possible only when there is a specification document, and typically requires the signing of a non-disclosure agreement. Both strategies require significant time, money, or both. Therefore, as a general rule, file formats with publicly available specifications are supported by a large number of programs, while non-public formats are supported by only a few programs.

The most useful part of intellectual property law for protecting ownership of a file format appears to be patent law. Although patents for file formats are not directly permitted under US law, some formats require the encoding of data with patented algorithms. For example, the GIF file format requires the use of a patented algorithm, and although initially the patent owner did not enforce it, they later began collecting fees for use of the algorithm. This has resulted in a significant decrease in the use of GIFs, and is partly responsible for the development of the alternative PNG format. However, the patent expired in the US in mid-2003, worldwide in mid-2004; algorithms are themselves not currently patentable under European law.





Non User