Due to their unique oversight, governments have a surplus of data at their fingertips. Used properly, this available data could enable them to create beneficial programs that tackle problems in economics, policy, transportation, and civic life. Unfortunately, the majority of that data is untapped. Here are the facts about AI startups needing data, and how that helps governments.
All hope is not lost, though. The good news is that initiatives have been put in place to unlock the power of this data. And startups in a number of industries are poised to capitalize on this wealth of data to optimize services and solve the issues most important to customers.
Thanks to the digitization of government systems spurred when former President Barack Obama established data.gov in 2013; there’s a strong mandate to collect and organize government-owned data. And there’s an even bigger push to make that data more accessible to the American public.
Governments are replicating the trend to make data more accessible all over the world.
It is significant to understand that making data could spark significant job growth, public sector efficiency, and social good.
In the private and academic sectors, more companies are investing in AI and machine learning to solve complex problems in new ways.
Data fuels AI
Data fuels AI, but growing pressure from regulators and customers to maintain user privacy has caused corporations (e.g., social media giants, product review sites, and travel companies) to be wary of sharing large customer datasets. For startups building the next generation of AI tools, this situation presents challenges.
There’s an interesting opportunity at the intersection of these two trends. Governments want to share data, and startups need that data to power technological innovation.
What is the perfect match?
By leveraging government data, startups can improve working and living conditions for everyone while solving various issues currently plaguing society.
Data for All
The U.S. government boasts more than 200,000 public datasets hosted through data.gov, divided among numerous lists and segmented into categories such as:
- Information on 173 million taxi rides in New York City.
- Weather data from the past 100 years.
- Mexican birth records.
- U.S. government procurement data.
There are also semipublic datasets like the MLS real estate listings. These large and reasonably priced aggregated datasets can be useful for startups looking to transform legacy institutions or disrupt established industries.
Most corporations, nonprofits, and government agencies lack the technical skills to extract useful insights from their data. The same isn’t true of specialized AI startups, which have those skills in abundance but lack the necessary data volume for machine learning tools. Locating relevant data is typically the hardest problem these teams face.
Michael Hiskey, CMO of data management software provider Semarchy, advises startups to locate the data they need before beginning to develop AI solutions. Without that data, a solution can’t solve an actual problem. Once that data is secured, the focus should be on using it safely and effectively.
While a startup can easily set its sights on a particular dataset, its leaders should be conscious of the complexity of licensing private data.
The time it takes to sort out legal, pricing, privacy, and technical integration often is better spent rapidly proving value from a small sample set of the data or another public dataset entirely. You can always pull in more data later, and customers will be more likely to let a startup work with private data once it has shown it can extract insights.
Data owners have the responsibility to aggregate and anonymize their findings, although that still doesn’t mean everything is completely secure. Recently, a team of urban planners and researchers at MIT conducted a study in which they anonymized datasets that included “location-stamped” cellphone logs and transit trips produced by users in Singapore.
In 11 weeks, the team used an algorithm to match data points with specific users at a 95% accuracy rate. The example shows how easy it could be for bad actors to track citizens by merging anonymized datasets with their own.
Anonymization is only one privacy safeguard, and strict control policies should accompany any comprehensive security strategy.
As we’ve often seen, handling data properly is no trivial task. Businesses must do diligence to guarantee government agencies and private-sector agencies take that responsibility seriously. Aside from security, working with massive datasets means spending a significant amount of money on hosting and bandwidth.
Fortunately, major cloud providers have programs that support hosting data of public interest, and their access mechanisms make it easy for startups to get going.
Areas of Interest
The vast majority of work for any AI/machine-learning effort involves understanding, cleaning, and transforming data. In addition to cleaning data, startups must work closely with experts from relevant government agencies to understand how to interpret datasets.
The governments’ years of experience will accelerate the process of extracting insights. After all, AI/machine learning is still a tool best used to augment — not replace — human intelligence.
Emerging datasets — emerging companies.
There are many datasets that emerging companies can use in almost every field imaginable. AI and machine learning are incredibly useful when it comes to optimizing problems to be solved. All of these technologies present huge advantages over simple heuristic problem-solving methods.
In the healthcare sector, we’ll see companies helping to make significant breakthroughs when it comes to things like optimizing responses for emergency calls.
Recognizing environmental threats to critical infrastructure, and improving strategies for mitigating the spread of pests and diseases can undoubtedly provide an uptick in data, AI, and ML solutions when paired with government data.
Many companies already use government data to improve the overall health and safety of the public. CentralSquare Technologies is an AI startup that connects about 5,000 emergency call centers to patients in large metropolitan areas of the U.S. CentralSquare aims to reduce 911 response times when callers need help outside of their home area codes.
Geospiza, which launched in 2017, uses climate data analytics to help emergency managers and elected officials save lives in disaster situations. Geospiza also helps companies in weather-sensitive industries make smarter decisions about risk.
The company, Citizen, uses a combination of human employees and AI technology to directly pull information from police scanners and other sources to alert nearby citizens of incidents or potential threats.
Many local governments don’t have a sophisticated data strategy, which inhibits opportunities for everyone. The team at Stae addresses this by helping city governments, including those in Atlanta and Jersey City, New Jersey, improve their data warehouse solutions.
Plenty of startups already take advantage of data, but there are plenty of untapped opportunities.
Whether you have an idea that might help save lives or one that will help someone save a little bit of time, the government probably has data that can help you. It would be a shame to let it all go to waste.
Image Credit: Unsplash; joshua-sukoff