The Neurobagel CLI

The bagel-cli is a simple Python command-line tool to automatically parse and describe subject-level phenotypic and BIDS attributes in an annotated dataset for integration into the Neurobagel graph.

Installation

DockerSingularity

Option 1 (RECOMMENDED): Pull the Docker image for the CLI from DockerHub:

docker pull neurobagel/bagelcli

Option 2: Clone the repository and build the Docker image locally:

git clone https://github.com/neurobagel/bagel-cli.git
cd bagel-cli
docker build -t bagel .

Build a Singularity image for bagel-cli using the DockerHub image:

singularity pull bagel.sif docker://neurobagel/bagelcli

Running the CLI

CLI commands can be accessed using the Docker/Singularity image.

Note

The Docker examples below assume that you are using the official Neurobagel Docker Hub image for the CLI. If you have instead locally built an image, replace neurobagel/bagelcli in commands with your built image tag.

Input files

To run the CLI on a dataset you have annotated, you will need:

A phenotypic TSV
A corresponding phenotypic JSON data dictionary
(Optional) The imaging dataset in BIDS format, if subjects have imaging data available (1)

A valid BIDS dataset is needed for the CLI to automatically generate harmonized subject-level imaging metadata alongside harmonized phenotypic attributes.

Viewing CLI commands and options

The bagel-cli has two commands, pheno and bids.

Information about each command can be found by running:

DockerSingularity

# Note: this is a shorthand for `docker run --rm neurobagel/bagelcli --help`
docker run --rm neurobagel/bagelcli

# Note: this is a shorthand for `singularity run bagel.sif --help`
singularity run bagel.sif

To view the command-line arguments for a specific command:

DockerSingularity

docker run --rm neurobagel/bagelcli <command-name> -h

singularity run bagel.sif <command-name> -h

Running the CLI on your data

cd into your local directory containing (1) your phenotypic .tsv file, (2) Neurobagel-annotated data dictionary, and (3) BIDS directory (if available).
Run a bagel-cli container and include your CLI command and arguments at the end in the following format:

DockerSingularity

docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli <full CLI command here>

What is this command doing?

This combination of options --volume=$PWD:$PWD -w $PWD mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the container's working directory to the mounted path (so it matches your location on your host machine). This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)

singularity run --no-home --bind $PWD --pwd $PWD /path/to/bagel.sif <CLI command here>

What is this command doing?

This combination of options --bind $PWD --pwd $PWD mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the container's working directory to the mounted path (so it matches your location on your host machine). This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)

Example

If your dataset lives in /home/data/Dataset1:

home/
└── data/
    └── Dataset1/
        ├── neurobagel/
        │   ├── Dataset1_pheno.tsv
        │   └── Dataset1_pheno.json
        └── bids/
            ├── sub-01
            ├── sub-02
            └── ...

You could run the CLI as follows:

DockerSingularity

cd /home/data/Dataset1

# 1. Generate phenotypic subject-level graph data (pheno.jsonld)
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli pheno \
    --pheno "neurobagel/Dataset1_pheno.tsv" \
    --dictionary "neurobagel/Dataset1_pheno.json" \
    --name "My dataset 1" \
    --output "neurobagel/Dataset1_pheno.jsonld"

# 2. Add BIDS data to pheno.jsonld generated by step 1
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli bids \
    --jsonld-path "neurobagel/pheno.jsonld" \
    --bids-dir "bids" \
    --output "neurobagel/Dataset1_pheno_bids.jsonld"

cd /home/data/Dataset1

# 1. Generate phenotypic subject-level graph data (pheno.jsonld)
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif pheno \
    --pheno "neurobagel/Dataset1_pheno.tsv" \
    --dictionary "neurobagel/Dataset1_pheno.json" \
    --name "My dataset 1" \
    --output "neurobagel/Dataset1_pheno.jsonld"

# 2. Add BIDS data to pheno.jsonld generated by step 1
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif bids \
    --jsonld-path "neurobagel/pheno.jsonld" \
    --bids-dir "bids" \
    --output "neurobagel/Dataset1_pheno_bids.jsonld"

Tip

For short forms of CLI command options, see:
docker run --rm neurobagel/bagelcli pheno --help
or
docker run --rm neurobagel/bagelcli bids --help

Speed of the bids command

The bids command of the bagel-cli (step 2) currently can take upwards of several minutes for datasets greater than a few hundred subjects, due to the time needed for pyBIDS to read the dataset structure. Once the slow initial dataset reading step is complete, you should see the message:

BIDS parsing completed.
...

Upgrading to a newer version of the CLI

Neurobagel is under active, early development and future releases of the CLI may introduce breaking changes to the data model for subject-level information in a .jsonld graph file. Breaking changes will be highlighted in the release notes!

If you have already created .jsonld files for your Neurobagel graph database using the CLI, they can be quickly re-generated under the new data model by following the instructions here so that they will not conflict with dataset .jsonld files generated using the latest CLI version.

Development environment

To ensure that our Docker images are built in a predictable way, we use requirements.txt as a lock-file. That is, requirements.txt includes the entire dependency tree of our tool, with pinned versions for every dependency (for more information, see https://pip.pypa.io/en/latest/topics/repeatable-installs/#repeatability).

Setting up a local development environment

We suggest that you create a development environment that is as close as possible to the environment we run in production.

To do so, we first need to install the dependencies from our lockfile (dev_requirements.txt):

pip install -r dev_requirements.txt

And then we install the CLI without touching the dependencies

pip install --no-deps -e .

Finally, to run the test suite we need to install the bids-examples and neurobagel_examples submodules:

git submodule init
git submodule update

Confirm that everything works well by running a test: pytest . (no tests should fail).

Setting up code formatting and linting (recommended)

pre-commit is configured in the development environment for this repository, and can be set up to automatically run a number of code linters and formatters on any commit you make according to the consistent code style set for this project.

Run the following from the repository root to install the configured pre-commit "hooks" for your local clone of the repo:

pre-commit install