@07@The following information on data and file formats standards of the Mamala Bay Database Study Project has been faxed to all PI's on 10/26/94. If you have not read the information, please take some time to read it on-line or you can download this file (fstandrd.txt, ASCII text file) in File Area 9: Utilities and Misc. Area. Text of data and file format standards faxed to all PI's on 10/25/94 ======================================================================= We have received several data files from various investigators in this program. As we privately expected, few of these contain enough information to allow a database user to figure out what they contain. We enclose requirements for information to be included in submitted data files so that these files will be useful for current and future investigators on Mamala Bay. We are not doing this just to make your life difficult. There are several reasons why we think you should be concerned about making your data available and accessible to all:  If you are like me you will eventually forget exactly what you did and you will be glad your files are well labeled.  Data in poorly labeled and identified files are subject to misuse by others,  Part of the reason why this project came into being is that people could not agree on questions of fact. If all of the data are available with sufficient detail that anybody can see and interpret the data, fewer questions will arise about what you did and what you found.  You owe it to your co-investigators on the Mamala Bay project, and to the taxpayers footing the bill, to provide a very clear set of data and results. If you have questions about these requirements, or would like to see additional data included in the files, please let me know. For those who have already sent in files, please resubmit these files with the appropriate header information at your earliest convenience. After you have received this memo, we will no longer accept data without the information described in the attachment. STANDARD FILE FORMATS FOR MAMALA BAY DATA BASE We have received a number of data files in various formats and with different degrees of documentation. In most cases the documentation included in the files is either missing or insufficient for users of the data to determine what the data are or where they came from. Therefore, we would like to impose standards for data format and requirements for informative header information on all data files uploaded or sent to us. Originally, in an attempt to create a standard that would accommodate all users, we planned to base the standards on the data formats being used by the various investigators. However, we have received only a few data files so far. Therefore we will base the standards on these files, and if necessary amend the standards when we have a better idea of the range of data types and files we will receive. The objective of this exercise is to provide users of the database with usable data, both during and after the completion of the Mamala Bay program. To that end, all files must be usable with minimum requirements for specialized software, and all must be annotated so that a user can determine what the data are, where they came from, how they were collected, and who collected them. All data files residing in the Mamala Bay BBS should have the following three common components which should allow a user to easily identify and use the file: 1. The data file must be in one of the file formats in the table below. These formats are now the "standard" file formats for the Mamala Bay Database Project (MB-2) and nearly all off-the- shelf software packages include an option to save or export files into one of these formats. If you have trouble or are unable to save data in one of these formats, please contact us. Data File Formats ÉÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º Type of º File º File º Description º º File º Format º Extensionº º ÌÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ Spreadsheet³ Lotus 1-2-3³ WK1 ³ "Save As" Wk1 file option is ³ ³ ³ ³ ³ available in nearly all ³ ³ ³ ³ ³ spreadsheet applications. Use ³ ³ ³ ³ ³ standard Wk1 format without ³ ³ ³ ³ ³ Impress or Allways page layout ³ ³ ³ ³ ³ settings. ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Database ³ dBase IV ³ DBF ³ "Save As" or "Export" to dbf ³ ³ ³ ³ ³ Format option is available in ³ ³ ³ ³ ³ nearly all database and ³ ³ ³ ³ ³ spreadsheet applications. ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Formatted &³ WordPerfect³ WP5 ³ "Save As" WP 5.0/5.1 or ASCII ³ ³ non- ³ 5.0/5.1 ³ ³ text format option is ³ ³ formatted ³ or ASCII ³ ³ available in nearly all word ³ ³ Text ³ text ³ ³ processing applications. ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ASCII ³ ASCII ³ ASC ³ All ASCII text data either ³ ³ text ³ ³ or ³ must be delimited by commas ³ ³ (data) ³ ³ TXT ³ and quotes or have fixed field ³ ³ ³ ³ ³ length (with field definition ³ ³ ³ ³ ³ file attached). ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ 2. All data files must include header information which explains and summarizes the data. Lotus and ASCII files should contain this information internally as headers. For database files, include with the data file an additional ASCII text file containing the header information. This provides you, the investigator, the choice of information used to describe the dataset on the BBS. It will also reduce or eliminate future questions about the data. Thus, although this will cost you some effort now, it will probably save hassles later. If you send in multiple files containing similar data, you will need to let us know whether the new files contain updates (i.e. older versions can be deleted) or additional data to that which we already have. If there are multiple files containing similar information (e.g. from successive sampling dates), you may be tempted to put the header information in only one file. However, since it is easy to put the information in additional files and this sort of information does not take up much space, it would be better to include it in all files. At minimum, the header information should include the following (where applicable).  The file name with extension  Mamala Bay Project number and name, e.g., MB-10, Environmental Impacts of Receptors and Resources  Person(s) and organization collecting the data or sample  Person(s) entering, converting, or translating the data into the current data file  Contact person's name, affiliation, and phone number in case there are questions about the data file (normally the PI of this study)  A detailed file description. This will be copied into the BBS and used as the "detailed file description" for users to view while on-line using the [I]nfo option. Thus, this description should include enough information to allow a user to decide whether to download the data. It should be concise, in sentences or phrases, and fully descriptive of the content of the file.  Structure of the data file. This should indicate what the rows and columns of data contain. Data structures include: - Flat file (each row containing all information about a particular datum, such as date, time, station, depth, taxon, abundance) - Table, in which rows and columns represent two dimensions of a matrix (e.g., rows are taxa and columns are stations) - Multi-table, in which a third (or even fourth) dimension of a data matrix is represented by multiple tables having the same structure (e.g., rows are taxa, columns are stations, and each table represents a different sampling date)  Time period of data or sample collection or measurement This should include the date or range of dates or times depending on content. This should enable a reader to distinguish among multiple files containing similar information.  Locations of data collection, including station identification and latitude and longitude. Station identifiers may be used for the data tables, and latitude and longitude may be given in a key in the header.  A list of variables included in the file, in the same order as they appear in the data set. This should include the variable name as it appears in the table and a description of what was measured.  For each variable measured, the methods of data collection, measurement, and analysis. This should be as complete as possible, but also concise; ordinarily, citations either to the open literature or to documents in the Mamala Bay collection will suffice if methods have not changed. If necessary to save space, this material can be put only in the first file of a series; however, if that is done the file descriptions of all files should indicate the location of the full descriptions.  Any other pertinent information 3. Generally follow the data format layout described below. If the format is Flat File, be sure that each row of the data table contains all of the information needed to describe the measurements contained in that row. This format is most suitable for dBase or ASCII files. The table or multi-table format is more suitable for spreadsheet files. All data fields must be adequately labeled. It must be clear to a new reader what has been measured, where, how, when, and how many times. Data field descriptions should correspond exactly to the descriptions in the header. All electronic file transfer via the BBS should be done with compressed (e.g., PKZIP) files to reduce the on-line time and to conserve hard disk space. Shareware versions of file compression/de-compression utilities PKZIP and PKUNZIP are available for download in File Area 9: Utilities and Misc. Area. PKZIP allows compressing multiple files into one compressed file which makes it very convenient to include the descriptive ASCII text file with data file(s).