Compilers' Toolbox™ - Value documentation

Value Documentation

	Updated 2019-04-01

Values and their metadata

In order to be able to compare and use data from different food composition tables and to avoid ambiguous data or data that can be misunderstood, it is important that values are clearly described, not only on which components the values concern, but also where the data come from, how they were sampled, analysed or derived, how many samples were used to produce the value, etc.
In older food composition tables, this information was given in the preface as explanation of the content in the tables. Today, such information is not enough and it is compulsory that every single datum is described with a range of metadata for the value documentation.

The first guidelines

Although food composition tables have been produced regularly for more than 125 years, it is not longer than about 50 years ago it became apparent that there were big differences between food composition tables and that data were so badly documented that guidelines were needed.

The very big challenges led a working party organised in 1970 by the group of European nutritionists to get the problems of sampling of foods, the analytical methods and their limits, and their documentation to be described. The result is in a short form clearly clearly presented by David A. T. Southgate in 'Guide Lines for the Preparation of Tables of Food Composition' from 1974.

In the following years it became more and more obvious that common guidelines for developing food composition tables and databases were necessary. Regional and international frameworks on food composition, like NORFOODS, INFOODS, and Eurofoods, were initiated.

During the 1980's and 1990's the INFOODS initiative published several publications on data documentation and data interchange like INFOODS Food Composition Data Interchange Handbook and Identification of Food Components for INFOODS Data Interchange, and in 1992 the first edition of Greenfield and Southgate's 'Food Composition Data - Production, Management and Use' was published with support from the FLAIR Eurofoods-Enfant project, INFOODS and CEC Agro-Industrial Research.

This work was followed up by the publication from COST Action 99 - Eurofoods with 'Recommendations for food composition database management and data interchange', published in 2000, and further enhanced in the EuroFIR project (2005-2010) with the 'Proposal for structure and detail of a EuroFIR standard on food composition data', which lays down the structure and format of food composition in detail.

The 2nd edition of Greenfield and Southgate's 'Food Composition Data - Production, Management was published by FAO/INFOODS in 2003. It is now the cuurent 'Bible' in food composition.

The EuroFIR project also published standardised thesauri to support in value documentation and further developed EuroFIR Web Services and an XML template for the EuroFIR Food Data Transport Package to be used in data interchange.
For more details on data interchange, see Data Interchange.

Data Management Terminology

The terminology in connection with multidisciplinary environments is sometimes confusing because terminology from the different working environments are mixed together. Sometimes different words meaning the same thing are used - or worse, sometimes the same words are used for different things. In food composition data management the situation is unfortunately not different.: the terminology depends on the persons defining it.

Example of food data management terms from Greenfield and Southgate, EuroFIR and USDA Nutrient Database (see References below) are shown in the following table:

Environment	Greenfield and Soutgate	EuroFIR Standard	USDA Nutrient Databank
Paper/electronic file	Data sources These are the published research papers and unpublished laboratory and other reports containing analytical data, together with their bibliographic references. Normally, the data sources are part of the reference database.	Data sources Original published and unpublished research papers, laboratory reports containing analytical data and data from manufacturers, including nutritional labelling data.
Data management	Archival data/records These records (written or computerized) hold all data in the units in which they were originally published or recorded, and are scrutinized only for consistency as would be normal in the refereeing of scientific papers prior to publication. Foods should be coded or annotated to assist in identification, and values should be annotated to indicate unit, calculation, mode of sampling, numbers of food samples analysed, the analytical methods used and any quality assurance procedures in place. Any bibliographic references relevant to the data source are noted. At this stage it is possible to make a preliminary assessment of the data quality (see G&S;, Chapter 8). Such records should make it unnecessary to refer back to the original data sources whenever a query arises. Normally, the archival data are used in the preparation of the reference database.	Initial data/Initial database Initial databases contain original data from each data source, optionally converted into standard units and/or standard coding or naming schemes. Data for individual analyses and food samples are held separately, possibly resulting in more than one value for a food/component combination. The documentation holds details of origin and number of food samples, food and analytical sample handling, edible portion, waste, analytical methods and quality-control methods.	Initial food item Detailed nutrient/food component, weight, and physical component (i.e., part of plant or animal determined by dissection, including flesh, peels, bones, etc.) values are entered and food item description and methodology information are documented.
Data management	Reference database The reference database is the complete pool of rigorously scrutinized data in which all values have been converted into standard units and nutrients are expressed uniformly, but in which data for individual analyses are held separately. This database should include all foods and nutrients for which data are available, and provides links to sampling procedures and analytical methods, laboratory of origin, date of insertion and other relevant information, including bibliographic references to the data sources. The data will usually be expressed according to the conventions, units and bases adopted for the user databases (see G&S, Chapter 9). The reference database will usually be part of a computer database management system, with computer programs or written protocols developed to calculate, edit, query, combine, average and weight values for any given food. It is from this database and its programs that the user databases can be prepared. The database will be linked to records on analytical methods and records for other constituents, for example non-nutrient constituents such as biologically active constituents, additives and contaminants. Records of physical characteristics such as pH, density, non-edible portion or viscosity that are often collected in food technology papers should also be linked to the reference database. Conversion factors, calculations and recipes should also be stored.	Aggregated data/Aggregated database/Compiled database Aggregated and compiled databases contain one single value for each food/component combination, aggregated where appropriate from multiple values in the level 2 database. Includes all foods and components needed to produce the published database. A further level of Compiled data may also be defined, with values created within the system to complete missing data (e.g. logical zeros, calculated values).	Aggregated food item Individual nutrient/food component, weight, and physical component data for similar food items can be aggregated.
Data management	User database, printed and computerized tables In general, the user database is a subset of the reference database, and the printed form often contains less information than the computerized form. Many professional users of food composition data would require the information recorded in the reference database, but most require only a database containing evaluated food composition data that, in some cases, have been weighted or averaged to ensure that the values are representative of the foods in terms of the use intended. Moreover, values for nutrients in each food may, if appropriate, be amalgamated (e.g. total sugars, ratios of the different classes of fatty acids) rather than shown as individual constituents. These databases may contain indications of data quality based on assessment of the sampling and analytical procedures. These databases should include as many foods and nutrients as possible, with preference being given to complete data sets. Methods, sampling procedures and literature sources should be coded at nutrient level so the user can perform an independent evaluation or comparison with other databases. The data, of course, must be expressed in uniform, standard units (see G&S,Chapter 9). The defining feature of a user database may be considered as a database that gives one series of data per food item.	Published databases and tables The public resources which hold evaluated food composition data. This level may reveal only subsets or derivations of the aggregated or compiled database, and may be specially designed to meet the needs of different user groups. They may include data that have been weighted or averaged to ensure that the values are representative of the foods in terms of the use intended.	Compiled food item Missing nutrient/food component values are imputed using standardized procedures, including recipes and/or formulations, and the food item profile is finalized for dissemination.
Data management		User database Adaptations of the published databases made by or for users such as food consumption surveys or software providers. Remaining missing values might be estimated; data might be derived for foods to more closely correspond to the consumed foods.

The most striking differences are between the definitions of Greenfield and Southgate on one side and the EuroFIR Standard/USDA Nutrient Databank on the other side. For a person not familiar with Greenfield and Southgate's terminology, e.g. a computer scientist, the terms Archival data/records and Reference database may very well be misunderstood: Archival data/records as being the place where all backups of the data management system as well as previous system mirrors are being held. the term Reference database is misleading as it is misunderstood by some as the place where the bibliographic references of the Data sources are being held.

The examples above are only the "top of the iceberg".

EuroFIR Value Documentation Thesauri

In order to document food composition data thoroughly and unambiguously across boundaries, it is important to use the same terminology in the data documentation. One of tools to use - like for food description - is a set of controlled terms, a standard vocabulary or thesaurus.

A set of thesauri to be used in value documentation was defined within the COST Action 99/EURO-FOODS project in the Recommendations for data interchange and management and further amended in the EPIC data interchange project. Each thesaurus consists of a set of concepts that may be arranged within a hierarchy. A concept is represented by a main descriptor – a term representing the concept – and is generally further described with a scope note, additional information, synonyms and related terms.The thesauri were futher refined in the EuroFIR project.

The draft EuroFIR standard on food composition data make use of this series of thesauri (controlled vocabularies) in the description of foods, components, method types, analytical and calculation methods, units, etc. The thesauri follows international standards for multilingual thesauri, which secures international development and use of the thesauri.
All thesauri are available on the EuroFIR website and are updated regularly.

The first collection of EuroFIR thesauri was published in 2008. The thesauri collection comprise 8 thesauri of controlled and weldefined descriptors used to explain different properties of values and their bibliographic references. In 2016 the EuroFIR thesaurus, the EuroFIR Food Classification, will be published in the EuroFIR eThesaurus together with the 8 value documentation thesauri.

The EuroFIR thesauri currently held in the EuroFIR eThesaurus, an online thesaurus manager, are the 8 value description thesauri and the EuroFIR food classification thesaurus (from LanguaL). The thesauri are available to the user in several formats from the online EuroFIR Thesaurus Manager.

Evaluation of Data Quality

It has always been a task of the food composition data compiler to provide data of good quality. Often this has been done with the compiler's best judgement. However in more recent years the has been several attempts to create more systematic procedures for the evaluation of food composition data. USDA (Holden et al., see references below) started in the 1990's with a series of articles concerning critical evaluation of published analytical data.
In EuroFIR, the USDA approach was found very labour intensive and not possible to carry our in a European context, but based on the experiences from USDA a EuroFIR task group defined a set of simplified rules to be used in the evaluation of data from scientific literature and laboratory reports.

References

David A. T. Southgate:
Guide Lines for the Preparation of Tables of Food Composition.
Karger AG, Bern, 1974.
J. Périssé:
The Heterogeneity of Food Composition Tables.
In J. G. A. T. Hautvast and W. Klaver, eds., The Diet Factor in Epidemiological Research, EURO-NUT Report, No. 1, page 100-105.
Ponsen and Looyen, Wageningen, 1982.
Klensin J.C.:
INFOODS Food Composition Data Interchange Handbook.
United Nations University, Tokyo 1992.
Klensin J.C., Feskanitch, D., Lin, V., Truswell, A.S. & Southgate, D.A.T.:
Identification of Food Components for INFOODS Data Interchange.
United Nations University, Tokyo 1989.
Greenfield H. & Soutgate D.A.T:
Food Composition Data - Production, Management and Use.
Elsevier Science Publishers, 1992
Greenfield H. & Southgate D.A.T:
Food Composition Data: Production, Management and Use, 2nd Edition
FAO Rome, 2003
Schlotke F., Becker W., Ireland J., Møller A., Ovaskainen M.L., Monspart J., Unwin I. (Eds.):
COST Action 99 - Eurofoods recommendations for food composition database management and data interchange.
Report No. EUR 19538, Luxembourg: Office for Official Publications of the European Communities, 2000 (79 pp.), ISBN 92-828-9757-5.
Becker W., Unwin I., Ireland J., Møller A.:
Proposal for structure and detail of a EuroFIR standard on food composition data.
I: Description of the standard. EuroFIR Technical Report - 2007-07-13.
Becker W., Møller A., Ireland J., Roe M., Unwin I., Pakkala H.:
Proposal for structure and detail of a EuroFIR Standard on food composition data.
II. Technical Annex - Version 2008.
EuroFIR Technical Report D1.8.19.
Danish Food Information 2008. ISBN 978-87-92125-10-1.
Møller A., Unwin I.D., Ireland J., Roe M.A, Becker W., Colombani P.:
The EuroFIR Thesauri 2008.
EuroFIR Technical Report D1.8.22.
Danish Food Information 2008.
ISBN 978-87-92125-09-5.
Møller, A., Christensen T.:
EuroFIR Web Services - Food Data Transport Package, Version 1.3.
EuroFIR Technical Report D1.8.20.
Danish Food Information 2008.
ISBN 978-87-92125-08-8.
Pakkala H., Christensen T., Gunnarsson Í., Kadvan A., Keshet B., Korhonen T., Martínez de Victoria I, Møller A., Presser K., Colombani P., Nørby E.:
EuroFIR Web Services - Specification of request-response message exchange patterns - Version 1.0.
EuroFIR Technical Report D1.8.29.
Danish Food Information 2008.
ISBN 978-87-92125-12-5.
Schubert, A., Holden, J. M., Wolf, W. R.:
Selenium content of a core group of foods based on a critical evaluation of published analytical data.
J. Am. Diet. Assoc. (1987) 87, pp. 285-299.
Holden, J. M., Bhagwat, S. A., Patterson, K. Y.:
Development of a Multi-nutrient Data Quality Evaluation System.
Journal of Food Composition and Analysis (2002) 15, pp. 339–348.
DOI: 10.1006/jfca.2002.1082
Bhagwat, S., Patterson, K., Holden, J. M.:
Validation study of the USDA's Data Quality Evaluation System.
Journal of Food Composition and Analysis (2009), 22 (5) , pp. 366-372.
DOI: 10.1016/j.jfca.2008.06.009
Holden, J., Bhagwat, S., Haytowitz, D., Gebhardt, S., Dwyer, J., Peterson, J., Beecher, G., Eldridge, A., Balentine, D.:
Development of a database of critically evaluated flavonoids data: Application of USDA's data quality evaluation system.
Journal of Food Composition and Analysis (2005) ,18 (8), pp. 829-844
DOI: 10.1016/j.jfca.2004.07.002
Haytowitz, D.B., Lemar, L.E., Pehrsson, P.R.:
USDA’s Nutrient Databank System – A tool for handling data from diverse sources.
Journal of Food Composition and Analysis (2009), 22, pp. 433–441
DOI: 10.1016/j.jfca.2009.01.003
Mangels, A., Holden, J., Beecher, G., Forman, M.R., Lanza, E.:
Carotenoid content of fruits and vegetables: An evaluation of analytic data.
Journal of the American Dietetic Association (1993), 93, pp. 284-296
DOI: 10.1016/0002-8223(93)91553-3
Mangels, A., Holden, J., Beecher, G., Forman, M.R., Lanza, E.:
Erratum: Carotenoid content of fruits and vegetables: An evaluation of analytic data.
Journal of the American Dietetic Association (1993), 93, pp. 284-296.
DOI: 10.1016/0002-8223(93)91808-4
Lurie, D., Holden, J., Schubert, A., Wolf, W., Miller-Ihli, N.:
The copper content of foods based on a critical evaluation of published analytical data.
Journal of Food Composition and Analysis (1989), 2 (4), pp. 298-316.
DOI: 10.1016/0889-1575(89)90002-1
Schubert, A., Holden, J., Wolf, W.,
Selenium content of a core group of foods based on a critical evaluation of published analytical data.
Journal of the American Dietetic Association (1987), 3, pp. 285-299.
Bigwood, D., Heller, S., Wolf, W.:
Selex: An expert system for evaluating published data on selenium in foods
Analytica Chimica Acta (1987), 200, pp. 411-419.
DOI: 10.1016/S0003-2670(00)83787-3
Oseredczuk, M., Salvini, S., Møller, A., Roe, M. Catanheira, I., Colombani, P., Holden, J., Ireland, J., Unwin, I.,, Vasquez, A.-L., Westenbrink, S., Ollilainen, V., Finglas, P.:
Guidelines for Quality Index Attribution to Original Data from Scientific Literature or Reports for EuroFIR Data Interchange.
EuroFIR Workpackage 1.3, Task Group 4, Proposed Update, March 2013.


	© 2025 Anders Møller, Danish Food Informatics