AEP: 13 Title: File kinds, Tags and Archival Policies Version: $Revision: 1.0 $ Last-Modified: $Date: $ Author: Federico Di Gregorio Status: Draft Type: Standard Track Content-Type: text/plain Created: 2-Oct-2002 Post-History: Abstract This document explains how file kinds (from inventories), tags and archival policies work togheter to decide what goes inside an archive during an import or a commit. Apparently this document change a lot of things, if compared to the protocol 1 but the underlying idea does not change: it is just made explicit and some mechanichs (like making explicit name tags) added to make the whole more coherent Rationale No rationale: this document describe the core of the Arch protocol. Definitions FIXME: should this be moved intoits own AEP? We define a "linting operation" a set of actions that check the project tree consistency either as their main task or as a side effect (usually a linting operation does not proced if the tree is in an inconsistent state.) File kinds The concept of "file kind" is broader in scope than the process of deciding what goes (or goes not) into an archive. File kind is assigned by matching file's name against a set of regular expressions and can be one of the following: S (Source) Source files for the project. C (Control) Control files are mainly treated as source files, but are managed by arch itself and not by the user directly. Usually they reside in the "{arch}" subdirectory, located in the root of the project tree. P (Precious) Precious files are, well, precious. They should be preserved at all costs and are never patched. If archived they are always saved as pristine copies. B (Backup) Backup files, automatically generated by various tools or directly by the user. Backup files can be removed after a little bit of thinking, but arch will never directly fiddle with them. G (Generated) Generated files, like ".o", ".py" or even ".c". Those files can be easily generated if the right tools are available. Arch should not touch generated files, but the user can remove them without even thinking about it. J (Junk) This category is for temporary files, arch ",," diretories and even cache-like files or directories. They can be silently deleted without problems ? (Unrecognized) Unrecognized files are files that did not match any kind regular expression. So they are rally unrecognized and will make arch complain loudly on lint checks. Directories are special, because they can enforce a file type on their contents. Usually directories are of kind Source or Control and let the contained files have their own kind, but Precious, Backup, Junk and Generated directories will force contents to their own kind (and the content will disappear, not even being considered for linting operations, patches, etc.) Unrecognized directories always make arch operations abort with an error. Directories are identified by prepending a 'D' to their type, so, for example, a precious directory is of type "DP". "File kind" regular expressions (RE) can be located in two different places, the "=tagging-method" file in the "{arch}" directory or the "=tags" file located anywhere in the project tree. RE defined in "=tags" files are valid for the directory they are in and any subdirectory, but can be overridden by "=tags" files located deeply in the project tree. The default RE are (this example also shows the format used by RE): source ^([_=a-zA-Z0-9].*|\.arch-ids|\{arch\}| \.project-tree-version|\.arch-project-tree)$ exclude ^(.arch-ids|\{arch\}=tags)$ precious ^(\+.*|\.gdbinit|=build\.*|=install\.*| VS|CVS\.adm|RCS|RCSLOG|SCCS|TAGS)$ backup ^.*(~|\.~[0-9]+~|\.bak|\.orig|\.rej| \.original|\.modified|\.reject)$ generated ^(.*\.(o|a|so|core|pyc|pio)|core)$ junk ^(,.*)$ Note that the "source" RE should match both Source and Control files. Control files are then identified by the "exclude" RE (in accordance with original arch format for the "=tagging-method" file.) RE should be on a single line, with the name separated by the regular expression by white space (1 or more spaces, no tabs.) They were splitted on multiplelines only for easy reading. File kinds can also be explicitly set by tagging, see below. Tags Tags are usefull markers to identify files across renames and patchset operations. There are 3 different kind of tags: x (explicit) Explicit tags are assigned (or removed) by the user. Arch never changes an explicit tag without user intervention. Directories and symbolic links can be explicitly tagged. i (implicit) Implicit tags are built from information gathered from the file itself (see below for implicit tags format). Directories and symbolic link can't (obviously) have implicit tags. n (name or kind tags) This kind of tag is assigned by arch while doing its internal computations (tree linting, making or applying patches, etc.) and is used to store the file md5sum for integrity purpuses. Note that this kind of tag can be automatically regenerated at every archive operation and arch implementations should be able to deal with missing name tags. A name-tagged file is sometimes referred as "untagged" in this document. Tag format Tags are composed of two parts, separated by the '/' (UNIX path separator) character. kind/id "kind" is the tag kind, a two letter combination obtained by appending the tag type (x, i, n) to the file/directory kind. The id is a unique (in the project-tree) identifier, made of any printable character but not containing '/'. Id format FIXME: we still need to define it. The =tags files Tags are saved in =tags files, one per line, after regular expressions. The format is simple: tag Explicit tags are saved immediately upon definition and never have their "id" regenerated. Implicit and name tags are regenerated during inventory operations and can be saved into the =tags file at any time on user request. If the user then does any change to the archive (changing an implicit tag, moving a name-tagged file), linting operations will report the changes as missing files (the tag exists but the file can't be found.) This can be or not usefull to the user, so tools to save and delete implicit and name tags from the =tags files should be provided. Arch implementation should not depend on the availability of such tags before a patch generation. But... ...explicit, implicit and name tags of Source and Control files and diretories _do_are_saved_ to the =tags files on patch generation operations. Explicit tags have their "md5sum" part updated and implicit and name tags are regenerated and saved. This builds a "manifest" that will be used both during the mkpatch operation and as an integrity check (md5sums) for all the files, when checked out in a pristine tree. The md5sums of all the =tags are saved in a separate file, called the "+signature" of the patch. The "+signature" file contains one line per =tags file with the full, relative path to the file and its md5sum: path/to/=tags The "+signature" file is located in the "{arch}" directory and can be optionally PGP signed, with a detached signature in "+signature.sig". (Note that the current =manifes operations can be defined in terms of the new "manifest" defined as a collection of =tags files. We can also have functionlity to merge =tags into a real =manifest file, if usefull to the user.) Archival Policies Archival policies dictates what goes in an archive and how (pristine copy of the file, patch, etc.) They are based on file kinds and tags, as described below. All the archival policies (names, implicit,explicit) require a "clean" project tree, where the meaning of "clean" is the following: * no Unrecognized files and/or dirs are present * there are no duplicated tags (of all types) With the "names" archival policy, files are archived depending mainly on their kind and tags are ignored: * Precious, Backup, Junk and Generated files are not archived * Source files are renamed if they move without changing, and patched if they change without moving. Else the old file is deleted and new one created (i.e., names does not allow a rename-and-patch operation) * Control files are copied verbatim if changed and/or moved (who wants to patch a =tags file?) * Tags are ignored The "implicit" tagging method add support for tags: * Precious, Backup, Junk and Generated files are not archived * rename-and-patch operations are possible on tagged files (either implicit or explicit tags); untagged files are treated as in "names" * Control files are copied verbatim if changed and/or moved The "explicit" tagging method defines a much strictier policy: * Precious, Backup, Junk and Generated files are not archived (but note that it is possible to explicitly tag the file to be Source!) * rename-and-patch operations are possible on tagged files (either implicit or explicit tags * untagged Source files are *not archived* * Control files are copied verbatim if changed and/or moved References No references. Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 72 End: