Home Amazon Web Services Seeks Public Data Sets

Amazon Web Services Seeks Public Data Sets

Amazon is turning to the public for help, asking for public data sets in an attempt to create a cloud data service that provides what they describe as a “convenient way to share, access, and use public data.”

Called AWS Hosted Public Data Sets, the service will enable you to use public data within your Amazon EC2 environment. Select public data sets will be hosted on AWS for free as an Amazon EBS snapshot.

While there are publicly available data sets, accessing them can be expensive and tedious. For instance, the Gutenberg Project offers its eBooks files as a download, but to get a copy you can expect to wait 48 hours for the download to be complete (based on DSL 1MBit/s and a 14.5 GB zip file). If you want the mp3, you’ll have a nine day wait to download the 91.5GB file.

However, as there is no indication that the Gutenberg Project will be added to AWS, we’ve calculated how long it would take to download and upload the 80GB UGI Virtual Conformer Library, one of the listed data sets AWS plans to host.

Using a residential cable provider in California, it would take 22 hours 36 minutes to download, and 3 days 36 minutes to upload to a server in the same state. However, if the server was in New York and we accessed it from California, it would take 3 days 42 minutes to download, and 7 days 14 hours to upload. Clearly inefficient.

People have been searching for better ways to access public data sets for some time, and AWS Hosted Data Sets may just be the answer they’ve been looking for; allowing anyone to do the type of computing that in the past has been limited to large organizations with lots of money.

Current data sets that Amazon are working on include: annotated Human Genome data, PubChem and UGI Virtual Conformer libraries, the U.S. Census, various labor statistics, and various economic and transportation databases.

AWS will continue to add to the collection over time, and this is where you come in.

If you have a public data set and hold the rights to the distribution of it, you can submit a request on the AWS Public Hosted Data Sets site to have it included.

This is huge.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.