Docear multiple pdf links with one pdf file

11/16/2023

Whilst not the most useful thing in the world on its own, it’s great for embedding into scripts. Scholar.py is a script for querying Google Scholar from the commandline. I’ve made a quick and dirty Nix package for Docear. Similar to Zotero, this can work well for getting the “low hanging fruit”, like PDFs with existing metadata. Its reference management is built on JabRef, but seems to work better in my experience.

Even after filling in some CAPTCHAs, I couldn’t get it to work for more than a couple of dozen files.ĭocear is a rather bloated application for managing “projects”, which just-so-happen to contain bibliographies. There seems to be a request limit for Google Scholar.If there is no metadata to extract, it usually fails (it tries the filename, but this may be unhelpful).There are two major problems with this approach: using Google Scholar) and present any BibTeX it finds. This will extract metadata from the PDFs, search for it online (eg. Export the resulting BibTeX and copy into your real BibTeX file.Add to it links to the PDF files we wish to import.Zotero has a nice workflow for importing PDF files: Making it work on NixOS is a little tricky. Zotero is a bibliography manager, built around Mozilla’s XUL toolkit. I’d give each a try, and move on if you have too many difficulties. Some of these may work for you straight away, some may require tweaking, some may prove hopeless. If a document contains its DOI on the first couple of pages, it can be extracted easily. DOIs: a digital object identifier (DOI) is a form of URI which uniquely identifies a document.If available, this can be extracted very easily. Metadata: PDFs can contain metadata, like author and title, in a similar way to MP3s and JPEGs.Some documents may be converted via OCR (optical character recognition), although there may be mis-spellings, etc. This is difficult to handle, since it doesn’t contain any machine-readable strings of text. many from the 1960s and earlier, will be scans essentially, one giant image. are few enough for me to import manually. Filetype: I’m only considering PDFs for now, since postscript, HTML, etc.Document PropertiesĮach document can be considered to have a bunch of properties, which can influence how easy or hard it is to import it automatically. This document describes the various approaches I’ve taken, as well as providing handy commandline snippets which I can use in the future. Recently I decided to automatically import as many of these documents as possible, to see how far I could get. It certainly makes a decent effort, with Dolphin and Okular built in, but requires an awful lot of context-switching between the different “panes”/tabs.

KBibTeX is certainly nice to use as a viewer of the documents which are already in Bibtex.bib, but unfortunately it’s still sort of clunky to do the above kind of import procedure, since it neccessarily involves viewing documents which aren’t in the database yet. In fact, some of this is made a little smoother by KBibTeX, which combines a BibTeX editor, document viewer and search engine into one tool.

Add a localfile key pointing to the document file.
Move that document file into ArchivedPapers.
Copying the most likely-looking BibTeX entry.
Enter some of its details (title, author, etc.) into a search engine like Google Scholar.
I keep a “master” BibTeX file called Documents/ArchivedPapers/Bibtex.bib, which I add entries to whenever the need arises, in the following way: The major problem with this approach is that it’s quite tedious to actually cite any of these files, since they don’t have associated bibliographic information. Whenever I recall reading some particular fact, I can usually find the reference in that directory (usually via a simple grep). This directory now contains many files, on many diverse topics from type theory, to particle Physics, to AI. Every time I find a document online (mostly PDFs, but postscript, sometime HTML, etc.) I save it to my Documents directory.

0 Comments

Docear multiple pdf links with one pdf file

Leave a Reply.

Author

Archives

Categories