Our full dataset takes around 80GB of storage and is distributed as a multi-part zip archive (each smaller than 256MB). We provide a Python script that downloads and assembles the archive for your convenience. Open up the script to find the url of individual parts if needed.