Clinical Ultrasound Image Repository
This repository containes generic ultrasound data from random subjects,
acquired for various clinical reasons from three different disciplines:
- Abdomnial imaging (patients with prefix "A")
- Cardiac imaging (patients with prefix "C")
- Obstetrics and Gynaecology (patients with prefix "O")
All data is in DICOM format and was de-identified to be in accordance with HIPAA regulations:
- Protected personal information was removed from DICOM elements and burnt-in image-pixel data.
- Absolute dates are reduced to the year, with "19xx" or "20xx" used for patients age 90 or older.
- An age information of "90+" is used for patients 90 and over at the time of imaging.
- No data from patients younger than 18 years was used.
- Private groups and Overlay groups were removed from the DICOM data.
Contact Information: shuver@nvidia.com
Managed by: MONAI Development Team
Content of the Repository
Studies
| Abdominal
| Cardiac
| OB/Gyn
| Overall
|
---|
Number of Patients: | 667 | 667 | 666 | 2000
|
Number of Studies: | 667 | 667 | 666 | 2000
|
Number of Image Files: | 24283 | 37356 | 25001 | 86640
|
Years of Imaging: | 2012-2021 | 2014-2021 | 2018-2021 | 2012-2021
|
Age at Exam: | median | 55yrs | 58yrs | 30yrs | 39yrs
|
[5%, 95%] | [22, 80] | [20, 83] | [21, 39] | [21, 79]
|
Deceased: | 12.29% | 17.84% | 0.150% | 10.1%
|
Female: | 54.27% | 55.17% | 100.0% |
|
Trimester: | 1st: 2nd | 48.94% 51.05% |
|
Manufacturers |
|
---|
Philips Medical Systems: | 434 | 667 | 396 | 1497
|
GE Healthcare: | 132 | | 269 | 401
|
Siemens Healthineers: | 82 | | | 82
|
Samsung Medison: | 10 | | 1 | 11
|
B-K Medical: | 8 | | | 8
|
Acuson: | 1 | | | 1
|
Access and Download
- Using your browser from the download page
- Following the instructions by Amazon as a starting point
- Mounting the S3 bucket as an s3fs file system, e.g., by s3fs-fuse for Linux
- Using the Python code (or an equivalent) below for bulk downloading
Example Python Code:
To download a JSON file with the cumulative meta information and all studies as per-study ZIP files:
host = "https://clinical-ultrasound-image-repository.s3.amazonaws.com"
httpGET(host+"/archives/meta-only/all-meta.json")
with open("all-meta.json","r") as metafile:
studies = json.load(metafile)
for study in studies:
httpGET(host+"/archives/per-study/"+studies[study]["_ref"]+".zip")
where httpGET()
wraps around your favorite HTTP library or tool; e.g., on Linux a simple
def httpGET(url):
subprocess.call(["curl", url, "-o", os.path.basename(url)])
will do to utilize curl
for the actual HTTP download.
File-Tree Structure
- <patient>_<study-description>
- images
- dicom
- <patient>-<series>-<scan>.dcm
- DICOM Image file with meta group
- meta
- json
- <patient>-meta.json
- Hierarchical-dictionary JSON file
- csv
- <patient>-meta.csv
- alternate Comma-Separated-Value file
- archives
- per-study
- <patient>_<study-description>.zip
- ZIP file with image and meta data following above structure
- meta-only
- all-meta.json
- Combined JSON file as a single dictionary
- all-meta.csv
- Combined CSV file as a single sheet
- all-meta.zip
- ZIP file with all individual JSON and CSV files
JSON-Dictionary Structure
- <patient>
- _ref
- <patient>_<study-description> (directory reference)
- demo
- <key>
- <value> (sub-dictionary with demographic information)
- exam
- <key>
- <value> (sub-dictionary with specifics on procedures)
- labs
- <type>
- <value> (not tied to a specific year)
- <year>
- <type>
- results for the given year, single or multiple entries (unit may be empty)
[<value>,<unit>]
[ [<value>,<unit>], [<value>,<unit>], ... ]
Comma-Separated-Value Structure
Description of Columns:
- <patient>
- Category: demo,exam,labs
- <key> or <type> (from JSON dictionary)
- <value>
- <unit> (optional)
- <year> (optional)
© 2023 The University of Iowa, Creative Commons CC-BY-NC 4.0