View Single Post
Old 05-16-2024, 11:40 AM   #366
davidjoseph1
Enthusiast
davidjoseph1 's soul has travelled the oceans between the stars.davidjoseph1 's soul has travelled the oceans between the stars.davidjoseph1 's soul has travelled the oceans between the stars.davidjoseph1 's soul has travelled the oceans between the stars.davidjoseph1 's soul has travelled the oceans between the stars.davidjoseph1 's soul has travelled the oceans between the stars.davidjoseph1 's soul has travelled the oceans between the stars.davidjoseph1 's soul has travelled the oceans between the stars.davidjoseph1 's soul has travelled the oceans between the stars.davidjoseph1 's soul has travelled the oceans between the stars.davidjoseph1 's soul has travelled the oceans between the stars.
 
Posts: 30
Karma: 129898
Join Date: May 2011
Device: Onyx Boox M90, M92 (*3),M96, N96,I86ml,C67ml,Kepler,Poke,Poke2,Nova3
Quote:
Originally Posted by davidjoseph1 View Post
I tried to update Library Codes to use the modern LC Catalog language, but I found an old metadata source plugin called SRU that did the same thing, and got it working.

https://www.mobileread.com/forums/sh...14&postcount=5
To follow up on this brief post:

Library Codes generated a query based on LCCN, ISBN, or ISSN that returned a page from the OCLC Classify service, stripped the table inside that page, and used BeautifulSoup to scan that table for markup elements denoting the various custom columns (Dewey, LoC classification, etc). I was trying to figure out how to substitute the loc publicly served query page into that beautiful soup protocol that DaltonST coded, but the markup is xml and I didn’t figure out how to do that.

However, DaltonST chose to implement a scraper for a resource that wasn’t optimized for data interchange - namely, whatever html generated by Classify. However, there *are* much more convenient and interpretable data formats that can be called via various services. The Z 39.50 protocol was adapted for service through a Web interface and this protocol is called SRU. The fields are standardized, there are many SRU and Marcxml Services out there can be par by regular expression to extract the data from the labeled fields


Long time ago, more than 11 years, a gentleman In Germany coded the metadata source plugin SRU, Which does exactly that. The parser doesn’t use Beautiful soup, it hits the SRU server with an extremely fast recall time, and the SRU service provides many more data fields than were present on the old class webpage. In addition, SRU Returns, As One of the fields , a static marcxml url that provides the MARCXML extremely efficiently, and I would almost prefer to parse that, But I have no idea how to do that. It does embed a LOC link as an identifier in the calibre book record.

I am trying to do is perform a transplant of the search language and configuration in SRU into Library codes, in order to return the parsed Significant data fields into library codes python routine for updating the custom column.

I’ve never done python before, taught myself everything I needed to learn in order to update SRU for calibre 6.17, and I would welcome any help that people might provide. SRU as written cannot pass custom column information as a metadata source plugin - I tried that already.


Any suggestions?
davidjoseph1 is offline   Reply With Quote