The Neurobagel CLI
The bagel-cli
is a simple Python command-line tool to automatically parse and describe subject-level phenotypic and BIDS attributes in an annotated dataset for integration into the Neurobagel graph.
Installation
Option 1 (RECOMMENDED): Pull the Docker image for the CLI from DockerHub:
docker pull neurobagel/bagelcli
Option 2: Clone the repository and build the Docker image locally:
git clone https://github.com/neurobagel/bagel-cli.git
cd bagel-cli
docker build -t bagel .
Build a Singularity image for bagel-cli
using the DockerHub image:
singularity pull bagel.sif docker://neurobagel/bagelcli
Running the CLI
CLI commands can be accessed using the Docker/Singularity image.
Note
The Docker examples below assume that you are using the official Neurobagel Docker Hub image for the CLI.
If you have instead locally built an image, replace neurobagel/bagelcli
in commands with your built image tag.
Input files
To run the CLI on a dataset you have annotated, you will need:
- A phenotypic TSV
- A corresponding phenotypic JSON data dictionary
- (Optional) The imaging dataset in BIDS format, if subjects have imaging data available (1)
- A valid BIDS dataset is needed for the CLI to automatically generate harmonized subject-level imaging metadata alongside harmonized phenotypic attributes.
Viewing CLI commands and options
The bagel-cli
has two commands, pheno
and bids
.
Information about each command can be found by running:
# Note: this is a shorthand for `docker run --rm neurobagel/bagelcli --help`
docker run --rm neurobagel/bagelcli
# Note: this is a shorthand for `singularity run bagel.sif --help`
singularity run bagel.sif
To view the command-line arguments for a specific command:
docker run --rm neurobagel/bagelcli <command-name> -h
singularity run bagel.sif <command-name> -h
Running the CLI on your data
cd
into your local directory containing (1) your phenotypic .tsv file, (2) Neurobagel-annotated data dictionary, and (3) BIDS directory (if available).- Run a
bagel-cli
container and include your CLI command and arguments at the end in the following format:
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli <full CLI command here>
What is this command doing?
This combination of options --volume=$PWD:$PWD -w $PWD
mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the container's working directory to the mounted path (so it matches your location on your host machine).
This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)
singularity run --no-home --bind $PWD --pwd $PWD /path/to/bagel.sif <CLI command here>
What is this command doing?
This combination of options --bind $PWD --pwd $PWD
mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the container's working directory to the mounted path (so it matches your location on your host machine).
This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)
Example
If your dataset lives in /home/data/Dataset1
:
home/
└── data/
└── Dataset1/
├── neurobagel/
│ ├── Dataset1_pheno.tsv
│ └── Dataset1_pheno.json
└── bids/
├── sub-01
├── sub-02
└── ...
You could run the CLI as follows:
cd /home/data/Dataset1
# 1. Generate phenotypic subject-level graph data (pheno.jsonld)
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli pheno \
--pheno "neurobagel/Dataset1_pheno.tsv" \
--dictionary "neurobagel/Dataset1_pheno.json" \
--name "My dataset 1" \
--output "neurobagel/Dataset1_pheno.jsonld"
# 2. Add BIDS data to pheno.jsonld generated by step 1
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli bids \
--jsonld-path "neurobagel/pheno.jsonld" \
--bids-dir "bids" \
--output "neurobagel/Dataset1_pheno_bids.jsonld"
cd /home/data/Dataset1
# 1. Generate phenotypic subject-level graph data (pheno.jsonld)
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif pheno \
--pheno "neurobagel/Dataset1_pheno.tsv" \
--dictionary "neurobagel/Dataset1_pheno.json" \
--name "My dataset 1" \
--output "neurobagel/Dataset1_pheno.jsonld"
# 2. Add BIDS data to pheno.jsonld generated by step 1
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif bids \
--jsonld-path "neurobagel/pheno.jsonld" \
--bids-dir "bids" \
--output "neurobagel/Dataset1_pheno_bids.jsonld"
Tip
For short forms of CLI command options, see:
docker run --rm neurobagel/bagelcli pheno --help
or
docker run --rm neurobagel/bagelcli bids --help
Speed of the bids
command
The bids
command of the bagel-cli
(step 2) currently can take upwards of several minutes for datasets greater than a few hundred subjects, due to the time needed for pyBIDS to read the dataset structure.
Once the slow initial dataset reading step is complete, you should see the message:
BIDS parsing completed.
...
Upgrading to a newer version of the CLI
Neurobagel is under active, early development and future releases of the CLI may introduce breaking changes to the data model for subject-level information in a .jsonld
graph file. Breaking changes will be highlighted in the release notes!
If you have already created .jsonld
files for your Neurobagel graph database using the CLI,
they can be quickly re-generated under the new data model by following the instructions here so that they will not conflict with dataset .jsonld
files generated using the latest CLI version.
Development environment
To ensure that our Docker images are built in a predictable way,
we use requirements.txt
as a lock-file.
That is, requirements.txt
includes the entire dependency tree of our tool,
with pinned versions for every dependency (for more information, see https://pip.pypa.io/en/latest/topics/repeatable-installs/#repeatability).
Setting up a local development environment
We suggest that you create a development environment that is as close as possible to the environment we run in production.
To do so, we first need to install the dependencies from our lockfile (dev_requirements.txt
):
pip install -r dev_requirements.txt
And then we install the CLI without touching the dependencies
pip install --no-deps -e .
Finally, to run the test suite we need to install the bids-examples
and neurobagel_examples
submodules:
git submodule init
git submodule update
pytest .
(no tests should fail).
Setting up code formatting and linting (recommended)
pre-commit is configured in the development environment for this repository, and can be set up to automatically run a number of code linters and formatters on any commit you make according to the consistent code style set for this project.
Run the following from the repository root to install the configured pre-commit "hooks" for your local clone of the repo:
pre-commit install