Organization of the data set

The complexity of the current ADNI data set reflects more than 20 years’ worth of variability in how data are collected, formatted, and stored.

From the previous discussion of phases and schedules, it should be clear that we expect to see changes in how data are structured across those five phases - and that is absolutely the case.

To be completely clear, the focus of this document is squarely on the tabular component of the data set - data that can easily be arranged into rows and columns. This is as opposed to data that does not fit so neatly in a conventional table, like MR images and long genetic sequences.

There are more than 200 tables in the ADNI data set, which can be categorized according to any number of schemas. The most obvious categorization scheme, and often the most useful, is in terms of the type of data being collected.

The structure here also closely reflects the organization of ADNI files on the Laboratory for Neuroimaging Image and Data Archive (LONI-IDA), but with some minor differences.

Clinical: tables that record the results of all clinical procedures that are administered during ADNI visits.
- Diagnostic assessment
- Neuropsychiatric assessments
- Clinical questionnaires
- Physical and neurological examination
Subject Characteristics: tables that capture information about participants that is ascertained via self-reported questionnaires upon enrollment into the study.
- Participant demographics
- Family history of AD/dementia
- Self-reported medical history and medications log
- Non-biological determinants of health
PET and MR Imaging: tables that are in some way relevant to PET or MR neuroimaging. This is not to be confused with the actual images themselves, which are also available for download.
- Scan acquisition information
- Quality control information
- Other metadata
- Numerical summary measures derived from images
Biofluid Biomarker: This large and quite imposing section of the data contains information on the inventory of ADNI biofluids as well as information derived from the analysis of biofluids that are collected from ADNI participants, including blood and CSF.

The analysis results available (largely based on CSF and plasma) include analyses done by the ADNI Biomarker Core. In addition, ADNI biofluid samples are available to outside investigators for analysis.

The results of these outside analyses are provided by laboratories not directly funded by ADNI. There is considerable variability concerning the information provided by these outside laboratories including the data description/methodology documentation, and data dictionaries.
Genetics and related –omics: There is extensive genetic information available on ADNI participants, including whole genome sequencing, genome-wide association studies, methylation profiles, etc.
Neuropathology: tables that record findings obtained from post-mortem brain tissue. This is one of the newest sections of the data set, and contains one major table that records autopsy findings, as well as a few separate tables that record ancillary information on autopsy particulars and consents to brain donation.
Structural and administrative tables: tables with content that relate less to any one scientific field and more so to some aspect of the study itself. These are varied enough to resist easy categorization, and all quite useful in their own regard. Examples include the registry (a central record of clinical visits), and the roster (a table that serves as a crosswalk between different participant identifiers)

There are other important distinctions between different tables. This is particularly true in the domains of biofluid biomarkers and PET/MR imaging.

Some tables are records of clinical or administrative events, generated from entries made by either site staff or ADNI core researchers into an EDC system. For example, neuropsychiatric assessments and MR QC information.
Other tables represent the results of some analysis carried out using ADNI data and samples. For example, the biofluid results tables showing the results of CSF assays for amyloid beta species, and brain tissue boundary shift integral (BSI) measurements derived from MRI images

Variability across phases

There is a natural relationship between the information captured in tables and the multi-phase structure of the ADNI study. Namely, for any piece of information that was collected across more than one phase of the study, there are critical two questions to keep in mind:

Was the information recorded consistently across phases?
Is the information for every phase recorded in the same table, or split into multiple tables by phase?

When we refer to information as being ‘recorded inconsistently’ in this context, we are referring to inconsistency in the way that data are formatted into tables, and not inconsistency in the actual measurements being collected.

This contingency table expands on the potential consequences of each combination of factors

	Information was recorded consistently across phases	Information was recorded inconsistently across phases
Information is split between more than one table	Multiple tables need to be merged prior to having the complete picture. There may be differences in naming/coding schemes.	May pose issues working across multiple phases due to missing information or inconsistency in the granularity of information.
Information is aggregated into a single table	All is well, but it’s still important to be on the lookout for data quality issues that aren’t strictly phase-specific	Certain fields may only apply to observations from particular phases. Some fields may be duplicated with distinct coding schemes.