PolyFLAME#

Polymorphic FLexible Analytics and Modelling Engine

This package is part of the Global.health-ISARIC pipeline.

Warning

This library is in prototype phase and subject to major revisions

Context and Problem#

Data processing and transformation (ETL) is done by the FHIRflat library. Once input data is brought into FHIRflat, it is represented as a (optionally zipped) folder of FHIR resources, with a parquet file corresponding to each resource: patient.parquet, encounter.parquet and so on.

Once the data is in FHIRflat, we need a easy to use library that can be used by itself, and as a building block for visualizations such as VERTEX.

Output: PolyFLAME is a library that can be used in Jupyter notebooks and other downstream code to allow querying answers to common research questions in a reproducible analytical pipeline (RAP).

Non-goals: Allow answering arbitrary questions. FHIRflat uses open formats (parquet) that users can query directly using tools such as pandas or the R {arrow} package, and FHIRFLAME allows flexibility in dataframe type as long as the dataframe schema required patterns for plot types (e.g. age pyramid plot should have a numeric age column).

Installing#

You can install PolyFLAME from GitHub

pip install git+https://github.com/globaldothealth/polyflame

Capabilities#

Generic base for building RAP#

PolyFLAME is a generic library for plots and standard analyses that will be used as a basis for constructing visualizations that work with clinical data. The library can generate plots and undertake analyses based on a general dataframe schema (column names and types).

        flowchart LR
OTH[Other formats] --> plot["plot()"]
REDCap --> plot
FHIRflat --> plot --> UP[upset plot]
plot --> FREQ[frequency plot]
plot --> P[pyramid plot]
    

Higher level modules in the form of data standard specific extensions produce a dataframe. Higher level modules do not know about the particulars of how a plot or analysis is performed. The generic plot() function can plot any data as long as it is of the correct ‘shape’:

# Example of a dataframe that can be used in a proportion plot
>>> df = pd.DataFrame({'condition': ['diabetes', 'lung disease'], 'fraction': [0.8, 0.2]})
>>> plot_unpacked(df, "proportion", cols={"label": "condition", "proportion": "fraction"})

which indicates to the underlying plotting and analysis module that the dataframe df is in a format that is acceptable to construct a proportion plot from. A proportion plot requires mapping to ‘label’ and ‘proportion’ columns (if they are not present) which is specified in the cols parameter.

Data standard specific extensions#

Extensions are coded as submodules of the main polyflame library, such as polyflame.fhirflat to read FHIRflat data

>>> from polyflame import load_data
>>> from polyflame.fhirflat import case_hospitalisation_rate, condition_proportions
# Data is always loaded with a checksum for reproducibility
>>> source = use_source('dengue', checksum="55d0b2642ede06e4d1e0137f85f0536a3256895c22b5e96c89bf923e7328606e")  # loads data from dengue folder
>>> source
{
  'N': 458245,
  'id': 'dengue-brazil-data'
  'path': '/Users/example/data/dengue'
  'checksum': '55d0b2642ede06e4d1e0137f85f0536a3256895c22b5e96c89bf923e7328606e'
}
>>> polyflame.fhirflat.case_hospitalization_rate(data)
{
  'mean': 13
  'lower_bound': 5
  'upper_bound': 18
}
>>> plot(condition_proportions(data, ["https://snomed.info/sct|2938499", "https://snomed.info/sct|1025273"]))
[plot shown]