Fatcat!: Difference between revisions
Hoof Hearted (talk | contribs) (Stats update) |
No edit summary |
||
(8 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
|name = <big>fatcat!</big><br>''<small>perpetual access to the scholarly record</small>'' | |name = <big>fatcat!</big><br>''<small>perpetual access to the scholarly record</small>'' | ||
|URL = https://fatcat.wiki | |URL = https://fatcat.wiki | ||
|logo = [[Category: | |logo = [[Category:No wiki logo]][[File:NoLogo.png|Wiki site positively confirmed that it does NOT have a logo, a generic 'no logo' image is displayed|alt=Small rectangular monochrome image, landscape orientation, thin black border with a white background, containing the words 'no logo' in black text.]] | ||
|wide logo = y<!--insert 'y' when the logo is too wide AND / OR the title name is too long--> | |wide logo = y<!--insert 'y' when the logo is too wide AND / OR the title name is too long--> | ||
|recentchanges URL= https://fatcat.wiki/changelog | |recentchanges URL= https://fatcat.wiki/changelog | ||
Line 12: | Line 12: | ||
|language = English | |language = English | ||
|editmode = ConfirmEmail | |editmode = ConfirmEmail | ||
|engine = | |engine = Custom engine<!--https://GitHub.com/internetarchive/fatcat/--> | ||
|license = | |license = Copyright to original source author<!--preferred; NO version numbers; see Category:Wiki license; if NO license, 'No license'--> | ||
|maintopic = Digital preservation | |maintopic = Digital preservation | ||
|backupurl = <!--database dump backup file URL; found at '/Special:Statistics' on Wikia & some other MediaWiki sites; archived URL may also be used--> | |backupurl = <!--database dump backup file URL; found at '/Special:Statistics' on Wikia & some other MediaWiki sites; archived URL may also be used--> | ||
Line 19: | Line 19: | ||
}}{{DISPLAYTITLE:fatcat!}} | }}{{DISPLAYTITLE:fatcat!}} | ||
{{Size <!--see 'Template:Size' for extensive full detail--> | {{Size <!--see 'Template:Size' for extensive full detail--> | ||
|pages= | |pages= 133991827<!--plain numeric value for number of CONTENT pages (or Files on a commons wiki); NO thousands separators--> | ||
|statistics URL= https://fatcat.wiki/stats<!--page count source--> | |statistics URL= https://fatcat.wiki/stats<!--page count source--> | ||
|wikiFactor= <!--(wF), preferred; see: Category:wikiFactor; if wF unknown leave void; archived wF value may be used--> | |wikiFactor= <!--(wF), preferred; see: Category:wikiFactor; if wF unknown leave void; archived wF value may be used--> | ||
|wikiFactor URL= <!--wF source; often 'PopularPages', 'Mostvisitedpages', 'PageHits'; leave void if unknown; archived URL if available--> | |wikiFactor URL= <!--wF source; often 'PopularPages', 'Mostvisitedpages', 'PageHits'; leave void if unknown; archived URL if available--> | ||
}}(Papers total as of: | }}(Papers total as of: 2023-01-21)<!--YYYY-MM-DD; manually add/amend ISO 8601 date when stats are verified and/or updated--> | ||
Line 41: | Line 41: | ||
*'''public editing''' interface, allowing metadata corrections and improvements from individuals and bots in addition to automated imports from authoritative sources | *'''public editing''' interface, allowing metadata corrections and improvements from individuals and bots in addition to automated imports from authoritative sources | ||
*focus on providing a stable [[API]] and corpus (making integration with diverse user-facing applications simple), while enabling full replication and [[mirror]]ing of the corpus to '''reduce the risks of centralized control''' | *focus on providing a stable [[API]] and corpus (making integration with diverse user-facing applications simple), while enabling full replication and [[mirror]]ing of the corpus to '''reduce the risks of centralized control''' | ||
This service aspires to be a piece of sustainable, long-term, {{tag| | This service aspires to be a piece of sustainable, long-term, {{tag|Non-profit|non-profit}}, {{tag|open source}}, {{tag|Collaboration|collaborative}}, digital infrastructure. It is primarily designed to support the ''archival'' and ''dissemination'' roles of {{tag|Academic|scholarly}} communication. It may also support the ''registration'' role (establishing precedence and authorship), but explicitly does not aid with ''certification'' of content, and is not intended to be used for ''evaluation'' of individuals, institutions, or venues. This service is 'universal', not curated. This means that it includes retracted works (annotated and disclaimed as such) and content some may consider 'predatory publishing'. | ||
;''Sources of Metadata | ;''Sources of Metadata | ||
Line 50: | Line 50: | ||
*Creator names and de-duplication from '''ORCID''', via their annual public data releases | *Creator names and de-duplication from '''ORCID''', via their annual public data releases | ||
*Journal title metadata from '''DOAJ, ISSN ROAD''', and '''SHERPA/RoMEO''' | *Journal title metadata from '''DOAJ, ISSN ROAD''', and '''SHERPA/RoMEO''' | ||
*Full-text URL lists from '''[https://CORE.ac.uk CORE], [https://Unpaywall.org Unpaywall], [https://www.SemanticScholar.org Semantic Scholar], [https://CiteseerX.ist.psu.edu CiteseerX]''', and '''[https://www.Microsoft.com/en-us/research/project/academic Microsoft Academic Graph]'''. | *Full-text URL lists from '''[https://CORE.ac.uk CORE], [https://Unpaywall.org Unpaywall], [https://www.SemanticScholar.org Semantic Scholar], [https://oa.mg OA.mg], [https://CiteseerX.ist.psu.edu CiteseerX]''', and '''[https://www.Microsoft.com/en-us/research/project/academic Microsoft Academic Graph]'''. | ||
*[https://guide.fatcat.wiki/sources.html The Guide] lists more major sources | *[https://guide.fatcat.wiki/sources.html The Guide] lists more major sources | ||
Many thanks for the hard work of all these projects, institutions, and individuals! | Many thanks for the hard work of all these projects, institutions, and individuals! | ||
Line 65: | Line 65: | ||
A list of technical contributors, including volunteers, is maintained in the source code repository ([[GitHub:internetarchive/fatcat/blob/master/CONTRIBUTORS.md|CONTRIBUTORS.md]]). Thanks everybody! | A list of technical contributors, including volunteers, is maintained in the source code repository ([[GitHub:internetarchive/fatcat/blob/master/CONTRIBUTORS.md|CONTRIBUTORS.md]]). Thanks everybody! | ||
[[Category: | [[Category:Founded in YYYY]] |
Latest revision as of 18:17, 16 April 2023
fatcat! perpetual access to the scholarly record Recent changes • [No WikiNode] • About • [No Mobile URL] | |
Founded by: | unknown |
Status: | Active |
Language: | English |
Edit mode: | ConfirmEmail |
Wiki engine: | Custom engine |
Wiki license: | Copyright to original source author |
Main topic: | Digital preservation |
Wiki size: | 133,991,827 article pages see stats |
(Papers total as of: 2023-01-21)
Fatcat is a versioned, publicly-editable catalog of research publications: journal articles, conference proceedings, pre-prints, blog posts, and so forth. The goal is to improve the state of preservation and access to these works by providing a manifest of full-text content versions and locations.
This service does not directly contain full-text content itself, but provides basic access for human and machine readers through links to copies in web archives, repositories, and the public web.
Significantly more context and background information can be found in The Guide.
Feedback and queries can be directed to [email protected].
- Goals and Features
A few things set Fatcat apart from similar indexing and discovery services:
- inclusion of archival, file-level metadata (hashes) in addition to URLs, which allows automated verification ('do I have the right copy'), reveals content-drift over time, and enables efficient distribution of content through the ecosystem
- native support for 'post-PDF' digital media, including archival web captures and datasets, as well as content stored on the distributed web
- data model that captures the work/edition distinction, grouping pre-print, post-review, published, re-published, and updated versions of a work together
- public editing interface, allowing metadata corrections and improvements from individuals and bots in addition to automated imports from authoritative sources
- focus on providing a stable API and corpus (making integration with diverse user-facing applications simple), while enabling full replication and mirroring of the corpus to reduce the risks of centralized control
This service aspires to be a piece of sustainable, long-term, non-profit, open source, collaborative, digital infrastructure. It is primarily designed to support the archival and dissemination roles of scholarly communication. It may also support the registration role (establishing precedence and authorship), but explicitly does not aid with certification of content, and is not intended to be used for evaluation of individuals, institutions, or venues. This service is 'universal', not curated. This means that it includes retracted works (annotated and disclaimed as such) and content some may consider 'predatory publishing'.
- Sources of Metadata
The source of all bibliographic information is recorded in edit history metadata, which allows the provenance of all records to be reconstructed. A few major sources are worth highlighting here:
- Release metadata from Crossref, via their public REST API
- Release metadata and linked full-text content from NIH Pubmed and arXiv.org
- Release metadata and linked public domain full-text content the JSTOR Early Journal Content collection
- Creator names and de-duplication from ORCID, via their annual public data releases
- Journal title metadata from DOAJ, ISSN ROAD, and SHERPA/RoMEO
- Full-text URL lists from CORE, Unpaywall, Semantic Scholar, OA.mg, CiteseerX, and Microsoft Academic Graph.
- The Guide lists more major sources
Many thanks for the hard work of all these projects, institutions, and individuals!
- Support and Acknowledgments
Fatcat is a project of the Internet Archive, a US-based non-profit digital library, well known for its Wayback Machine web archive, Open Library, and book digitization and lending services.
Development of Fatcat and related web harvesting, indexing, and preservation efforts at the Archive have been partially funded by a generous grant from the Andrew W. Mellon Foundation (Long-tail Open Access Journal Preservation). Fatcat supports this work by both tracking which open access works are in known archives and providing minimum-viable indexing and access mechanisms for long-tail works which otherwise would lack them.
The service would not technically be possible without hundreds of Free Software components and the efforts of their individual and organizational maintainers, more than can be listed here (please see the source code for full lists). A few major components include the PostgreSQL database, Elasticsearch search engine, Flask python web framework, Rust programming language, Diesel database library, Swagger/OpenAPI code generators, Kafka distributed log, Ansible configuration management tool, and Ubuntu GNU / Linux operating system distribution.
The name 'Fatcat' can be interpreted as short for 'large catalog', as the service aspires to be a complete catalog of the digital scholarly record.
A list of technical contributors, including volunteers, is maintained in the source code repository (CONTRIBUTORS.md). Thanks everybody!