Skip to Main Content

Research Data Management

Background and links to more information about data management issues.

Web of Science 2022 XML dataset

Dartmouth Library has acquired the 2022 Web of Science XML dataset for research and instructional use by Dartmouth faculty, students, and researchers.

The library has a subscription to the Web of Science database that indexes the world’s leading scholarly literature in the sciences, social sciences, arts, and humanities, as published in journals, conference proceedings, symposia, seminars, colloquia, workshops, and conventions across the globe.  

We have also purchased access to the 2022 XML dataset that includes reliable, complete metadata from over 12,500 high-quality journals from around the world, in over 250 science/social science/humanities disciplines.  Conference proceedings and book data are also available.  Data are available back to 1900 and include over 63 million article records and 1 billion cited references to date.
 
Dataset
The 2022 XML Dataset includes:

  • Science Citation Index Expanded XML (SCIE XML) – 1900-2022
  • Social Sciences Citation Index XML (SSCI XML) - 1900-2022
  • Arts & Humanities Citation Index XML (AHCI XML) – 1975-2022
  • Conference Proceedings Citation Index-Science & Technical XML (CPCI-S XML) – 1990-2022
  • Conference Proceedings Citation Index-Social Sciences & Humanities XML (CPCI-SSHXML) – 1990-2022
  • Book Citation Index-Science XML (BKCI-S XML) – 2005-2022
  • Book Citation Index-Social Sciences & Humanities XML (BKCI-SSH XML) – 2005-2022
  • Emerging Sources Citation Index XML (ESCI XML) – 2005-2022

XML data includes cited references, as well as a standalone times cited file. A separate DAIS ID file will also be provided.
 
Some key data elements:

  • ORCID identifiers are included on over 6.2 million records to support author disambiguation
  • funding acknowledgements, including agency and grant numbers, are indexed
  • full author and institutional affiliation information are indexed to enhance attribution of research and collaboration analysis
  • extensive unification of institution names to aggregate complex naming variations and sub-organizations

 
Access:
The 200GB dataset is available via a DartFS share or by individual download to a local machine. The DartFS share is a great option if you intend to use Dartmouth’s HPC environment and/or do not want to deal with the logistics of storing such a big dataset yourself. Contact ResearchDataHelp@groups.dartmouth.edu for access.

License:
Data use is for internal efforts only and cannot be included in any commercial product or distributed to others outside of Dartmouth College without explicit permission from Clarivate.

  • Data is to be used only for academic research, instruction, and data projects by faculty, staff, students and researchers affiliated with Trustees of Dartmouth College. Walk-ins cannot have access to the raw data itself under any circumstances, but can access WoS content via a system that does not allow any export of WoS data.
  • Commercial use of the data set or derived data is strictly prohibited.
  • Further distribution of this data or derivative is prohibited.

 
Dartmouth affiliates may not share Web of Science XML datasets or documentation, in whole or part, with any third party.

If you have any question about whether your use of Web of Science XML data is valid, please contact the Library at researchdatahelp@groups.dartmouth.edu.