Data behind the Inside Airbnb site is sourced from publicly available information from the Airbnb site. The data has been analyzed, cleansed and aggregated where appropriate to faciliate public discussion.
Explore 100,000 HD video sequences of over 1,100-hour driving experience across many different times in the day, weather conditions, and driving scenarios. Video sequences also include GPS locations, IMU data, and timestamps.
Caselaw Access Project (“CAP”) expands public access to U.S. law. Our goal is to make all published U.S. court decisions freely available to the public online, in a consistent format, digitized from the collection of the Harvard Law Library.
Data USA puts public US Government data in your hands. Instead of searching through multiple data sources that are often incomplete and difficult to access, you can simply point to Data USA to answer your questions. Data USA provides an open, easy-to-use platform that turns data into knowledge. It allows millions of people to conduct their own analyses and create their own stories about America – its people, places, industries, skill sets and educational institutions.
The European Data Portal harvests the metadata of Public Sector Information available on public data portals across European countries. Information regarding the provision of data and the benefits of re-using data is also included.
Food environment factors--such as store/restaurant proximity, food prices, food and nutrition assistance programs, and community characteristics--interact to influence food choices and diet quality. Research is beginning to document the complexity of these interactions, but more is needed to identify causal relationships and effective policy interventions. The objectives of the Atlas are to assemble statistics on food environment indicators to stimulate research on the determinants of food choices and diet quality, and to provide a spatial overview of a community's ability to access healthy food and its success in doing so.
Community contributed collection of open data from around the world. Uploaded by the public, data are often from public and open government website and sources. The searchable archive includes over 150,000 datasets as GeoJSON.
Produces global spatial information about the human presence on the planet over time. This in the form of built up maps, population density maps and settlement maps. This information is generated with evidence-based analytics and knowledge using new spatial data mining technologies. The framework uses heterogeneous data including global archives of fine-scale satellite imagery, census data, and volunteered geographic information. The data is processed fully automatically and generates analytics and knowledge reporting objectively and systematically about the presence of population and built-up infrastructures.
Maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.
Capture data from social media sites (Twitter, Facebook, YouTube, RSS Feed & text/csv file)
Discover popular topics
Find & explore emerging themes of discussions
Build, visualize and analyze online social networks using social network analysis
Map geo-coded social media data
Open Images is a dataset of ~9 million images that have been annotated with image-level labels and object bounding boxes.
The training set of V4 contains 14.6M bounding boxes for 600 object classes on 1.74M images, making it the largest existing dataset with object location annotations. The boxes have been largely manually drawn by professional annotators to ensure accuracy and consistency. The images are very diverse and often contain complex scenes with several objects (8.4 per image on average). Moreover, the dataset is annotated with image-level labels spanning thousands of classes.
Trends in health, food provision, the growth and distribution of incomes, violence, rights, wars, culture, energy use, education, and environmental changes are empirically analyzed and visualized; for each topic the quality of the data is discussed and the data sources provided.
Pew Research Center regularly makes available the full datasets that underlie most of its reports. Includes topics:
U.S. Politics & Policy; Journalism & Media; Internet, Science & Tech; Religion & Public Life; Hispanic Trends; Global Attitudes & Trends; Social & Demographic Trends; American Trends Panel
Spatially accurate and up-to-date population and settlement data are widely used in planning and decision making in both the public and private sectors to improve the effectiveness and efficiency of decisions, monitor impacts, and identify those who might otherwise be left behind. Understanding where people live and work, and the type and condition of their housing and other infrastructure, is critical in times of disaster, enabling emergency responders to reach those most in need more quickly with appropriate assistance. Such data can help improve access to public and private services, increase the sustainability of natural resources, and facilitate progress towards meeting the internationally accepted Sustainable Development Goals (SDGs). The POPGRID Data Collaborative aims to bring together and expand the international community of data providers, users, and sponsors concerned with georeferenced data on population, human settlements and infrastructure.
Dedicated archive for storing and sharing digital data (and accompanying documentation) generated or collected through qualitative and multi-method research in the social sciences. QDR provides search tools to facilitate the discovery of data, and also serves as a portal to material beyond its own holdings, with links to U.S. and international archives. The repository’s initial emphasis is on political science.
The corpus is available in the S3 bucket radio-talk at s3://radio-talk/v1.0/. The entire corpus is available as one file of about 9.3 GB at s3://radio-talk/v1.0/radiotalk.json.gz, and there's also a version with one file per month under s3://radio-talk/v1.0/monthly/.
Collection of 1161 datasets that were originally distributed alongside the statistical software environment R and some of its add-on packages. The goal is to make these data more broadly accessible for teaching and statistical software development.
Global registry of research data repositories that covers research data repositories from different academic disciplines. It presents repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions.
SEDAC, the Socioeconomic Data and Applications Center, is one of the Distributed Active Archive Centers (DAACs) in the Earth Observing System Data and Information System (EOSDIS) of the U.S. National Aeronautics and Space Administration. Focusing on human interactions in the environment, SEDAC has as its mission to develop and operate applications that support the integration of socioeconomic and earth science data and to serve as an "Information Gateway" between earth sciences and social sciences.
Actively developed since 2004 and is organically growing as a result of our research pursuits in analysis of large social and information networks. Largest network we analyzed so far using the library was the Microsoft Instant Messenger network from 2006 with 240 million nodes and 1.3 billion edges.
A community-edited data service aggregating transit networks across metropolitan and rural areas around the world. Aggregates stop, route, and schedule data from transit operators' authoritative GTFS feeds.
TweetsKB is a public RDF corpus of anonymized data for a large collection of annotated tweets. The dataset currently contains data for more than 1.5 billion tweets, spanning almost 5 years (January 2013 - November 2017). Metadata information about the tweets as well as extracted entities, sentiments, hashtags and user mentions are exposed in RDF using established RDF/S vocabularies. For the sake of privacy, we encrypt the usernames and we do not provide the text of the tweets. However, through the tweet IDs, actual tweet content and further information can be fetched.
Original data sets generated by PD&R-sponsored data collection efforts, including the American Housing Survey, median family incomes and income limits, as well as microdata from research initiatives on topics such as housing discrimination, the HUD-insured multifamily housing stock, and the public housing population.
Downloadable files include all documents received from January 1 through December 31 of any year, except the current year, by quarter. The current year includes all LD-1 and LD-2 documents received from January 1 to date by quarter.
A public resource that hosts an expanding collection of computable datasets, curated and structured to be suitable for immediate use in computation, visualization, analysis and more. Get Wolfram : https://caligari.dartmouth.edu/public/downloads/mathematica/
Data for Research (DfR) provides datasets of content on JSTOR for use in research and teaching. Researchers may use DfR to define and submit their desired dataset to be automatically processed. Data available through the service includes metadata, n-grams, and word counts for most articles and book chapters, and for all research reports and pamphlets on JSTOR. Datasets are produced at no cost to researchers and may include data for up to 25,000 documents.
Present a simple and intuitive API for UCI ML portal, where users can easily look up a dataset description, search for a particular dataset they are interested, and even download datasets categorized by size or machine learning task.
Provides APIs and full sets of downloadable files to a number of high-value, high priority and scalable structured datasets, including adverse events, drug product labeling, and recall enforcement reports.
Access data from the United Nations Commodity Trade Statistics database, including International Merchandise Trade Statistics (IMTS) and the work of the International Merchandise Trade Statistics Section (IMTSS) of the United Nations Statistics Division.
Currently has three different APIs to provide access to different datasets: one for Indicators (or time series data), one for Projects (or data on the World Bank’s operations), and one for the World Bank financial data (World Bank Finances API).
Includes clinical care provider quality information, nationwide health service provider directories, databases of the latest medical and scientific knowledge, consumer product data, community health performance information, government spending data.
Information about services and procedures provided to Medicare beneficiaries by physicians and other healthcare professionals, with information about utilization, payment, and submitted charges organized by National Provider Identifier (NPI), Healthcare Common Procedure Coding System (HCPCS) code, and place of service.