I have a 4k action camera and appropriate tripod setup. I'll record a long video, flipping pages, then go back and screenshot each page. I can then keep the pages as well as OCR them into a text.
The first thing I'm going to do is capture an old loose-leaf book manuscript with written notes all over it so that it can be edited. Certainly easier than transcribing by hand. The author and translating author daughter wrote this: https://Projex.Wiki/wiki/Wave_of_Terror.
I also have a monster load of books, magazines, art, and documents I'd like to scan, freely share, and sell/recycle the hard-copies. And I wouldn't mind taking on freedom-related document-scanning tasks too. All this also involves archiving it all. I don't know where to start or how to follow through.
Also a shame that InfoGalactic.com seems dead. Vox Day was one founder. Looks like he/they might be up to something, or maybe it's just a Grokipedia media analysis.
Very cool. How are you going to identify the relevent frames efficiently? That's something that could be automated. It would be cool to get a script together where you just feed it a video and it gives you a directory, page_0001.txt, page_0002.txt. It's very possible.
You just have to measure the periods of relative motion and non-motion for a couple sample pixels that are on the book and not you, but get enough of them to avoid a false positive on being still (basically a horizontal bar across the top). Then for each period of non-motion above a certain length you export a frame in the middle of it. So then you just need to pause for 1/10th a second (which you will do anyway) between each page.
I wonder how big that video file will be. If you give me a 2 minute sample I might see if I can extract the frames.
A script would be very nice. (A robot would be too.) I think it might be a lot of work though.
A video with me turning pages can't actually take that long - even if I leave the page exposed for one whole second, plus the moving in, out, and shadow clearance time. Then review it. All I have to do is screenshot during the "motionless" and shadowless second. All those screenshots can easily be batch-renamed in order. Making a PDF of those along with an OCR plain or rich text should also be easy.
Size of the 4K video is not important unless I were to upload it. Why do that when the 4K stills would carry all the info and be much smaller?
My biggest concern would be getting the best focus (and lighting and when my hand is not in shot).
Naturally, all this would be for the first book, Emma's dad's manuscript. I don't even remember if it's in Ukrainian Cyrillic or already in English. I can make a 2 minute sample too - and try to include all the problems that might arise to trouble-shoot with.
If I were organized and/or prepared I'd have more books/things ready to scan too. Of course, before I go scanning any book I'd first want to be sure it doesn't already exist online or in torrent (or on my 4tb drive of books) so that I'm not wasting time. And I need to get some new large drives so I can finally set up my DAS - but that's yet another big project. And then getting a server set up to selectively share stuff (verses just sharing/torrenting/IPFS open to all, which I suspect could potentially be problematic). I'm just a little guy who wants to be a free-libre large library.
A script isn't hard. I've done similar things before AI existed. With AI existing and me being very happy with my current setup it might take as little as 5 minutes to make. The robot would be hard, and would mangle the book until you did enough testing to make sure it didn't do that.
Thanks for the offer! I'll keep you posted. Her manuscript scan is a January project.
When I returned to SF/Oakland 2005-2007, some friends worked at the Internet Archive (and Google?) in the Presidio. They scanned old and ancient pre-copyright books - before doing newer books, bringing scandal. All by hand. Flip a page, lower the glass, click the cameras. I won't need their many large industrial setups. I haven't even bothered to check if our libraries have something to borrow/rent.
I can't imagine a robot ever able to do that with the older books.
Yet another project I am working on...
I have a 4k action camera and appropriate tripod setup. I'll record a long video, flipping pages, then go back and screenshot each page. I can then keep the pages as well as OCR them into a text.
The first thing I'm going to do is capture an old loose-leaf book manuscript with written notes all over it so that it can be edited. Certainly easier than transcribing by hand. The author and translating author daughter wrote this: https://Projex.Wiki/wiki/Wave_of_Terror.
I also have a monster load of books, magazines, art, and documents I'd like to scan, freely share, and sell/recycle the hard-copies. And I wouldn't mind taking on freedom-related document-scanning tasks too. All this also involves archiving it all. I don't know where to start or how to follow through.
Also a shame that InfoGalactic.com seems dead. Vox Day was one founder. Looks like he/they might be up to something, or maybe it's just a Grokipedia media analysis.
Very cool. How are you going to identify the relevent frames efficiently? That's something that could be automated. It would be cool to get a script together where you just feed it a video and it gives you a directory, page_0001.txt, page_0002.txt. It's very possible.
You just have to measure the periods of relative motion and non-motion for a couple sample pixels that are on the book and not you, but get enough of them to avoid a false positive on being still (basically a horizontal bar across the top). Then for each period of non-motion above a certain length you export a frame in the middle of it. So then you just need to pause for 1/10th a second (which you will do anyway) between each page.
I wonder how big that video file will be. If you give me a 2 minute sample I might see if I can extract the frames.
A script would be very nice. (A robot would be too.) I think it might be a lot of work though.
A video with me turning pages can't actually take that long - even if I leave the page exposed for one whole second, plus the moving in, out, and shadow clearance time. Then review it. All I have to do is screenshot during the "motionless" and shadowless second. All those screenshots can easily be batch-renamed in order. Making a PDF of those along with an OCR plain or rich text should also be easy.
Size of the 4K video is not important unless I were to upload it. Why do that when the 4K stills would carry all the info and be much smaller?
My biggest concern would be getting the best focus (and lighting and when my hand is not in shot).
Naturally, all this would be for the first book, Emma's dad's manuscript. I don't even remember if it's in Ukrainian Cyrillic or already in English. I can make a 2 minute sample too - and try to include all the problems that might arise to trouble-shoot with.
If I were organized and/or prepared I'd have more books/things ready to scan too. Of course, before I go scanning any book I'd first want to be sure it doesn't already exist online or in torrent (or on my 4tb drive of books) so that I'm not wasting time. And I need to get some new large drives so I can finally set up my DAS - but that's yet another big project. And then getting a server set up to selectively share stuff (verses just sharing/torrenting/IPFS open to all, which I suspect could potentially be problematic). I'm just a little guy who wants to be a free-libre large library.
A script isn't hard. I've done similar things before AI existed. With AI existing and me being very happy with my current setup it might take as little as 5 minutes to make. The robot would be hard, and would mangle the book until you did enough testing to make sure it didn't do that.
Thanks for the offer! I'll keep you posted. Her manuscript scan is a January project.
When I returned to SF/Oakland 2005-2007, some friends worked at the Internet Archive (and Google?) in the Presidio. They scanned old and ancient pre-copyright books - before doing newer books, bringing scandal. All by hand. Flip a page, lower the glass, click the cameras. I won't need their many large industrial setups. I haven't even bothered to check if our libraries have something to borrow/rent.
I can't imagine a robot ever able to do that with the older books.
archive.today mirror | archive.org mirror