Introduction

GEDCOM was developed by the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS Church) to provide a flexible, uniform format for exchanging computerized genealogical data. GEDCOM is an acronym for GEnealogical Data Communication. Its purpose is to foster the sharing of genealogical information and the development of a wide range of inter-operable software products to assist genealogists, historians, and other researchers.

Purpose and Content of The GEDCOM Standard

The GEDCOM Standard is a technical document written for computer programmers, system developers, and technically sophisticated users. It covers the following topics:

GEDCOM Data Representation Grammar (see Chapter 1)

Lineage-Linked Grammar (see Chapter 2, beginning on page *)

Lineage-Linked GEDCOM Tags (see Chapter 2, beginning on page * and Appendix A, *)

Cross Reference of Structures and Primitives (see Appendix B beginning on page *)

The Church of Jesus Christ of Latter-day Saints' temple codes (see Appendix C, page *)

ANSEL Character Codes (see Chapter 3, beginning on page *, and Appendix D beginning on page *)

This document describes GEDCOM at two different levels. Chapter 1 describes the lower level, known as the GEDCOM data format. This is a general-purpose data representation language for representing any kind of structured information in a sequential medium. It discusses the syntax and identification of structured information in general, but it does not deal with the semantic content of any particular kind of data. It is, therefore, also useful to people using GEDCOM for storing other types of data, not just genealogical data.

Chapter 2 of this document describes the higher level, known as a GEDCOM form. Each type of data that uses the GEDCOM data format has a specific GEDCOM form. This document discusses only one GEDCOM form: the Lineage-Linked GEDCOM Form. This is the form commercial software developers use to create genealogical software systems that can exchange compiled information about individuals with accompanying family, source, submitter, and note records with the Family History Department's FamilySearch Systems and with each other if desired.

 

This document is available on the internet at:

ftp://gedcom.org/pub/genealogy

Purposes for Version 5.x

Earlier versions of The GEDCOM Standard were released in October 1987 (3.0) and August 1989 (4.0). Versions 1 and 2 were drafts for public discussion and were not established as a standard.

The 5.x series of drafts includes both the first standard definition of the Lineage-Linked GEDCOM Form and also the first major expansion of the Lineage-Linked Form since its initial use in GEDCOM 3.0. The GEDCOM-compatible products registered as 4.0 systems should still be able to exchange all of the data that was previously handled by their product with GEDCOM 5.x systems. See "Compatibility with Previous GEDCOM Releases," (starting on page *) for compatibility specifics.

The following are the expanded purposes of Lineage-Linked GEDCOM :

Simplify the description of the GEDCOM data representation grammar (rules) for ease of understanding. (See Chapter 1, starting on page *.)

Standardize the valid contexts in which tags, values, and pointers appear in the Lineage-Linked GEDCOM Form. (See Chapter 2, starting on page *.) The Lineage-Linked GEDCOM Form should not be confused with other GEDCOM forms, which apply the basic GEDCOM data format but use different tag, value, and pointer combinations for other purposes.

Define new data representations for supporting information such as sources, source citations, repositories, submitter records, submission records, and notes. (See Chapter 2, page *, for GEDCOM representation of these support structures as used by the lineage-linked grammar.)

Define a generic event structure.

Define a way of associating individuals one to another. This is accomplished through a pointer which points from one individual record to another with a user-defined relationship description placed subordinate to this pointer. This feature is not a substitute for handling direct family relationships. Direct family relationships are represented by the FAMC and FAMS pointers.

Add a product version number and a GEDCOM form and version number to the HEADer record structure.

Define DATE modifiers (FROM, TO, ABT, BEF, AFT, BET) and more rigorously define the regular date format.

Define an integration of multimedia within GEDCOM.

Modifications in Version 5.5 as a result of the 5.4 (draft) review

Added tags for storing detailed address pieces under the address structure.

Added nickname and surname prefix name pieces to the personal name structure.

Added subordinate source citation to the note structure.

Changed the encoding rules and the structure for including embedded multimedia objects.

Added a RIN tag to the record structures. The RIN tag is a record identification assigned to the record by the source software. Its intended use is to allow for automated access to that record upon receipt of return transactions or other reconciliation processes.

The meaning of a GEDCOM tag without a value on its line depends on its subordinate context for any assertions intended by the researcher. For example, In an event structure, a subordinate DATE and/or PLACe value imply that an event happened. However, a subordinate NOTE or SOURce context by themselves do not imply that the event took place. For a researcher to indicate that an event took place without knowing a date or a place requires that a Y(es) value be added to the event tag line. Using this convention protects GEDCOM processors which may remove (prune) lines that have no value and also no subordinate lines. A N(o) value must not be used on an event tag line to assert that the event never happened. This requires the definition of a different tag.

Returned the calendar escape sequence to support alternate calendars.

The definition of the date value was refined to include many of the potential ways in which a person may define an imprecise date in a free form text field. Systems which guide users through a date statement should not result in such a precise way of stating an imprecise date. For example, if software was to estimate a marriage date based on an algorithm involving the birth date of the couple's first child, hardly needs to say "EST ABT 1881".

The following tags were added:

ADR1, ADR2, CITY, NICK, POST, SPFX

Changes Introduced or Modified in Draft Version 5.4

Some changes introduced in GEDCOM draft version 5.4 are not compatible with earlier 5.x draft forms. Some concepts have been removed with the intent to address them in a future release of GEDCOM. The following features are either new or different:

The use of the SCHEMA has been eliminated. Although the schema concept is valid and essential to the growth of GEDCOM, it is too complex and premature to be implemented successfully into current products. Implementing it too early could cause developers to spend a great deal of resources programming something that would be outdated very quickly. Object definition languages are likely to contribute to meeting these needs.

The EVENT_RECORD context has been deleted. This context was intended to support the evidence record concept in the Lineage-Linked GEDCOM Form, which ended up being more complicated than first supposed. Understanding the difference between the role of a source record and the role of a so-called evidence record requires further study.

Non-standard tags (see <NEW_TAG>, page *) can be used within a GEDCOM transmission, provided that the first character is an underscore (for example _NUTAG). Non-standard tags should be used only when structured information cannot be represented using existing context. Using a Note field is a more universal way of transmitting genealogical data that does not fit into the standard GEDCOM structure.

The SOURCE_RECORD structure was simplified into five basic sections: data or classification, author, title, publication facts, and repository. The data or classification section contains facts about the data represented by this source and is used to analyze the collection of sources that the researcher used. The author, title, publication facts, and repository sections provide free-form text blocks that inform subsequent researchers how to access the source data that the original researcher used.

The <<SOURCE_CITATION>> structure is placed subordinate to the fact being cited. It is generally best if the source citation contains only information specific to the fact being cited and then points to the more general description of the source, defined in a SOURCE_RECORD. This reduces redundancy, provides a way of controlling the GEDCOM record size, and more closely represents the normalized data model.

Systems that represent sources using the AUTHor, TITLe, PUBLication, and REPOsitory descriptions can and should always pass this information in GEDCOM using the SOURce record pointed to by the <<SOURCE_CITATION>>. Systems that do not represent source information in these categories should provide the following information as unstructured text using the tags, TITL, AUTH, PUBL, and REPO, respectively, within the text:

A descriptive title of the source

Who created the work

When and where was it created

Where can it be obtained or viewed

Some attributes of individuals such as their EDUCation, OCCUpation, RESIdence, or nobility TITLe need to be described using a date and place. Therefore, the structure to describe the attributes was formatted to be the same as for describing events. That is, these attributes are further defined using a date, place, and other values used to describe events. (See <<INDIVIDUAL_EVENT_STRUCTURE>>, page *.)

The LDS ordinance structure was extended to include the place of a living LDS ordinance. The TYPE tag line was changed to a STATus tag line. This allows statements such as BIC, canceled, Infant, and so forth to be removed from the date line and be added here under the STATus tag. (See <LDS_(ordinance)_DATE_STATUS>, page *) where (ordinance) represents any of the following: BAPTISM, ENDOWMENT, CHILD_SEALING, or SPOUSE_SEALING.

Previous GEDCOM 5.x versions overloaded the FAMC pointer structure with subordinate events which connected individual events and an associated family. An adoption event, for example, was shown subordinate to the FAMC pointer to indicate which was the adoptive family. The sealing of child to parent event (SLGC) was also shown in this manner. GEDCOM 5.4 recognizes that these are events and should be at the same level as the other individual events. To show the associated family, a subordinate FAMC pointer is placed subordinate to the appropriate event. (See <<INDIVIDUAL_EVENT_STRUCTURE>> page * and LDS_INDIVIDUAL_ORDINANCE at page *.)

The date modifier (int) was added to the date format to indicate that the associated date phrase has been interpreted and the interpretation follows the int prefix in the date field. The date phrase is also included in the date value enclosed in parentheses. (See <DATE_APPROXIMATED>, page *.)

The <AGE_AT_EVENT> primitive definition now includes the key words STILLBORN, INFANT, and CHILD. These words should be interpreted as being an approximate age at an event. (See <AGE_AT_EVENT>, page *.)

The family event context in the FAMily record now allows the ages of both the husband and wife at the time of the event to be shown. (See FAM_RECORD page *)

The <<PERSONAL_NAME_STRUCTURE>> structure now allows name pieces to be specifically identified as subordinate parts of the name line. Most products will not use subordinate name pieces. A nickname can now be included on the name line by enclosing it in double quotation marks. Note: Systems using the subordinate name parts must still provide the name structure formed in the same way specified for <NAME_PERSONAL> (see page *.)

A submission record was added to GEDCOM to enable the sending system to transmit information which will enable the receiving system to more appropriately process the GEDCOM data. The format currently designed for the submission record was created specifically for TempleReadyÔ system and for GEDCOM files being downloaded from Ancestral FileÔ . (See SUBMISSION_RECORD, page *.)

A RESTRICTION (RESN) tag and a <RESTRICTION_NOTICE> primitive were added to the INDIVIDUAL_RECORD context. This allows some records in Ancestral File to be marked for privacy (indicating some personal information is not included) and some records to be marked as locked (indicating that Ancestral File will not make changes to the record without authorization from an assigned record steward).

The following tags are no longer used in the Lineage-Linked Form:

ARVL, BROT, BUYR, CEME, CNTC, CPLR, DEFM, DPRT, EDTR, FIDE, FILM, GODP, HDOH, HEIR, HFAT, HMOT, INFT, INDX, INTV, ISA, ISSU, ITEM, LABL, LCCN, LGTE, MBR, NAMS, NAMR, OFFI, ORIG, OWNR, PERI, PORT, PWIF, PUBR, RECO, SELR, SEQU, SERS, SIBL, SIGN, SIST, SITE, TXPY, XLTR, WFAT, WITN, WMOT, AUDIO, IMAGE, PHOTO, SCHEMA, VIDEO

The following tags were added:

BLOB, CTRY, CREM, FCOM, GIVN, NPFX, NSFX, OBJE, PEDI, RELA, RESI, RESN, SUBN, SURN, STAT

Changes Introduced in Draft Version 5.3

Version 5.3 introduced the following changes to the GEDCOM standard:

An address structure was defined.

A new tag for marital status (MSTA) at the time of an event was added to the event structure. (This was removed in version 5.4.)

A mechanism for creating user-defined tags was added. These were defined in a SCHEMA definition in the header record of 5.3. (SCHEMA was removed in version 5.4.)

The Unicode standard (ISO 10646) was introduced as an additional character set. (This was reduced to potential character set in version 5.4. See Chapter 3, page *.)

A <<MULTIMEDIA_LINK>> structure was introduced to provide linking and embedding of digitized photo, video, and sound files. (This was modified in version 5.4. See MULTIMEDIA_LINK page * and MULTIMEDIA_RECORD page *)

The source structure NAME tag, meaning the name of the source in the <<SOURCE_STRUCTURE>>, was changed back to the TITLe tag and is used to show the title of a book, article, or descriptive title of non-titled sources.

The <<SOURCE_STRUCTURE>> was changed. Usage of CPLR, XLTR, and INFT tags in source substructures was discontinued.

The FORM {FORMAT} tag was added subordinate to the PLACe and the GEDCom tags in the HEADER record and also subordinate to the PLACe tag in the <<PLACE_STRUCTURE>>. The PLAC.FORM line in the header record indicates that all of the locality names are specified in a consistent hierarchal sequence as specified by the value of the FORM. For example: 2 FORM City, County, State. GEDCOM 5.2 used the TYPE tag, subordinate to the PLAC tag instead of the FORM tag, for this purpose. This provision is for products which have overly structured the place value.

 

Copyright © 1987, 1989, 1992, 1993, 1995 by The Church of Jesus Christ of Latter-day Saints. This document may be copied for purposes of review or programming of genealogical software, provided this notice is included. All other rights reserved.

Disclaimer: This HTML version of the GEDCOM 5.5 specification should be equivalent to the LDS wordperfect original. In the conversion process I have tried not to break anything however, the LDS original should always be considered the definitive version.
Clive Stubbings, October 2000