DataHubAPIDemo

Demonstration scripts and notebooks showing how to use the Data Hub API for data submissions, reporting, and other tasks.

Note: All of these scripts will require a Data Hub API key in order to use. Instructions for obtaining an API token can be found in the data submission documentation. For security reasons, these tokens should be stored as environment variables on your system. The scripts expect production and stage API keys to be stored in the environment variables PRODAPI and STAGEAPI, respectively.

DataHubAPIDemo.ipynb

This Jupyter notebooks walks through a basic example of how to do a CRDC submission using the Data Hub APIs. Topics covered in this notebook include:

Finding the studies you are approved to submit to
Creating a new submission or working on an existing submission
Uploading the data submission templates
Running the data and metadata validtions
Reviewing the results from validations
Final submission, cancellation, or withdrawl of a submission

DataHubAPIExtras.ipynb

This notebook covers several queries that can provide more detailed information on the status of your submissions such as:

Listing all the submissions you have
Getting high-level summary information about a specific submission
Getting detailed information about specific submissions
Getting a detailed inventory of the data that you've added to a submission
Deleting specific information from a submission
Retrieving a populated configuration file for use in uploading data files with the CLI Upload Tool

SubmissionReportDashboard.py

This is a Python Dash application that uses the APIs to create a personal dashboard of your submissions. To use this script, run the script (# python3 SubmissionReportDashboard.py), then launch a browser and navigate to http://localhost:8050.

Required Python Libraries dash, dash_bootstrap_components, plotly, requests, pandas, datetime, pytz

ShinyDashboard.py

Simialr to the SubmissionReportDashboard only uses Python Shiny instead of Dash.

SubmissionReset.py and SubmissionReset.ipynb

Submissions that are inactive for extended periods of time start generating warning emails and after 180 days get deleted. The remedy to this situation is to log into the Submission Portal and look at the submission. However, this gets burdensome if there are a large number of submissions to check. This script (also in notebook form) will query for all the submissions that are either New or In Progress and will request information from each of them. This re-sets the inactvitiy timer.

WarningAggregator.ipynb and WarningAggregator.py

Currently, the Submission Portal does not have aggreation for the Warnings that are generated by data updates. This can make it difficult to see if the updates that are about to be applied are correct. These programs will aggregate all the warnings and display them in a paired manner so that it is easier to see what changes are about to be applied.

Both scripts take the following configurations. In the notebook you will find these in a marked cell, edit that cell to need. For the script, the configurations are provided in a YAML file (see warning_configs.yml for an example)

subid: A list of the submission IDs to be checked. Submission IDs can be obtained from the Submission Portal severity: Should be set to 'All' nodelist: A list of the nodes that should be checked for Warnings outputdirectory: A local directory where output can be written tier: The tier to use, shoudl be either 'stage' or 'prod'

Script runtime options

-c/--configfile: The YAML configuration file
-v/--verbose: The level of verbosity. Add more v's to be more verbose.

SubmissionReset.py

This script resets the inactivity timer for all of your New or In Progress submissions to the current date.

Script runtime options

-t/--tier: The tier to check. Should be either 'prod' or 'stage'
-v/--verbose: The level of verbosity. Add more v's to be more verbose.

SubmissionResetGUI.py

This is a graphical version (Python Dash) of the SubmissionReset.py script. Select a tier from the drop-down and a table of your current New and In Progress submissions will be generated. Select the submissions you wish to reset the inactivity timer on and then click on the Reset Time on Selected Submissions button below the table. Each submission selected will be reset to the current date.

To use, run the script ($ python3 SubmissionResetGUI.py) and then bring up a browser and navigate to http://localhost:8050.

Required Python Libraries dash, dash_bootstrap_components, requests, pandas, datetime, pytz

When updating a submission that has previously been through DataHub, it's possible to get a great number of warnings that data is going to be changed. Unfortunately, the current Submission Portal interface doesn't have a way to aggregate and display these warnings which can make it difficult and tedious to check. This script and notebook will aggregate all the warnings in a submission and display alternating old and new lines in a table(notebook) or output a csv file (script).

DeleTron.py

This script addresses a weakness in the Submission Portal, namely that deleting some, but not all, entries in a node can become tedious. The graphical interface nicely supports deleting individual entries as well as entire nodes. However, the graphical interface does not support deleting dozens or hundreds of entries if needed.

DeleTron.py will take a Data Hub csv loading sheet and instead of adding the information to the submission, it will delete all the entries from the submission. This allows a submitter to start with one of their existing loading sheets, edit it down (or copy to a new load sheet) the entires they wish deleted. Like submission, deletion works on a node-by-node basis and a separate deletion sheet has to be provided for each node to be deleted.

Note:

Data Hub will also delete any child nodes that are orphaned by deleting a parent. For example, if a sample is orphaned when a participant is deleted, the sample will also be deleted even though a sample load sheet was never provided. For this reason it's usually useful to understand the existing relationships before deleting and exploring if updating the information would be a better approach.

DeleTron requires a yaml file with the following parameters. It is recommended you start with the delete_configs.yml example:

tier: The Data Hub tier you wish to use. Likely either stage or prod
deletefile: The full path to the file that contains the information to be deleted.
submissionid: The UUID for the submission you are editing. This can be copied from the upper left of the submission view in the GUI.
node: The node you will be deleting informaiton from. For example file, diagnosis, or participant

There is additional required information in the mdffiles section that should not be edited. If you create your own yaml configuration file, make sure this section is copied over and not edited.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.github/workflows		.github/workflows
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
assets		assets
DHTest.py		DHTest.py
DH_Queries.py		DH_Queries.py
DataHubAPIDemo.ipynb		DataHubAPIDemo.ipynb
DataHubAPIExtras.ipynb		DataHubAPIExtras.ipynb
DeleTron.py		DeleTron.py
LICENSE		LICENSE
MetadataUploadTest.py		MetadataUploadTest.py
QueryTesting.ipynb		QueryTesting.ipynb
QueryTesting.py		QueryTesting.py
README.md		README.md
ShinyDashboard.py		ShinyDashboard.py
ShinyDashboardModules.py		ShinyDashboardModules.py
ShinyTesting.py		ShinyTesting.py
SubmissionReport.ipynb		SubmissionReport.ipynb
SubmissionReportDashboard.py		SubmissionReportDashboard.py
SubmissionReset.py		SubmissionReset.py
SubmissionResetGUI.py		SubmissionResetGUI.py
WarningAggregator.ipynb		WarningAggregator.ipynb
WarningAggregator.py		WarningAggregator.py
delete_configs.yml		delete_configs.yml
warning_configs.yml		warning_configs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataHubAPIDemo

DataHubAPIDemo.ipynb

DataHubAPIExtras.ipynb

SubmissionReportDashboard.py

ShinyDashboard.py

SubmissionReset.py and SubmissionReset.ipynb

WarningAggregator.ipynb and WarningAggregator.py

SubmissionReset.py

SubmissionResetGUI.py

DeleTron.py

Note:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

pihltd/DataHubAPIDemo

Folders and files

Latest commit

History

Repository files navigation

DataHubAPIDemo

DataHubAPIDemo.ipynb

DataHubAPIExtras.ipynb

SubmissionReportDashboard.py

ShinyDashboard.py

SubmissionReset.py and SubmissionReset.ipynb

WarningAggregator.ipynb and WarningAggregator.py

SubmissionReset.py

SubmissionResetGUI.py

DeleTron.py

Note:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages