GITenberg (https://www.gitenberg.org) is an exploration of how Project Gutenberg might work if all the Gutenberg texts were on Github, so that tools like version control, continuous integration, and pull-request workflow could be employed.
For forty years, Project Gutenberg has led the way in making public domain text live on the internet as ebooks. GITenberg began in 2013 when Seth Woodworth wanted to improve some Project Gutenberg ebooks. He decided to load the ebooks onto GitHub, a version control and collaborative software development platform.
in 2015, Eric Hellman had the same idea and after loading a few books, he discovered that Seth was 43,000 books ahead of him. The two joined forces, and applied to the Knight Foundation's Newschallenge competition. The project was chosen from over a thousand proposals to receive a Prototype Fund grant to "explore the applicability of open-source methodologies to the maintenance of the cultural heritage" that is the Project Gutenberg collection.
There were big chunks of effort left to finish the work when the grant ended. in 2018, six computer-science seniors from Stevens Institute of Technology took up the challenge and brought the project within sight of a major milestone (if not the finishing-line). There remained only the reprocessing of 57,000 ebooks (with more being created every day!).
- Almost 57,000 texts from Project Gutenberg have been loaded into Github repositories.
- EPUB, PDF, and Kindle Ebooks have been rebuilt and added to releases for all but about 100 of these.
- Github webhooks trigger dockerized ebook building machines running on AWS Elastic Beanstock every time a git repo is tagged.
- Toolchains for asciidoc, HTML and plain text source files are running on the ebook builders.
- A website at https://www.gitenberg.org/ uses the webhooks to index and link to all of the ebooks.
- www.gitenberg.org presents links to Github, Project Gutenberg, Librivox, and Standard Ebooks.
- Cover images are supplied for every ebook.
- Human-readable metadata files are available for every ebook
- Syndication feeds for these books are made available in ONIX, MARC and OPDS via Unglue.it.
- All of the software that's been used is open source and content is openly licensed.
- PG's epubmaker software has been significantly strengthened and improved.
- About 200 PG ebooks have had fatal formatting errors remediated to allow for automated ebook file production.
- 1,363 PG ebooks were omitted from this work due to licensing or because they aren't really books.
- PG's RDF metadata files were converted to human-readable YAML and enhanced with data from New York Public Library and from Wikipedia.
- Github API throttling limits the build/release rate to about 600 ebooks/hour/login. A full build takes about 4 full days with one github login.
Everything in GITenberg has been built in the hope that the bits can be incorporated into Project Gutenberg wherever appropriate. Currently the Foundation is working with Project Gutenberg to make that hope a reality. Some baby steps towards applying version control to Project Gutenberg have begun. Project Gutenberg is a complex organism, and implementing profound changes will require broad consensus-building and resource gathering (both money and talent).
October 30, 2018. Free Ebook Foundation announces the launch of its GITenberg prototype website.