

Finally!
pointless


Finally!
Santagate 2019 Pro for Workgroups
I’ve seen paraphrases of the same thing at least 4 times so far. Multiple mothers confused about the terminology it seems.
Sure if you drag it through the garden.
PyMuPDF is excellent for extracting ‘structured’ text from a pdf page — though I believe ‘pulling out relevant information’ will still be a manual task, UNLESS the text you’re working with allows parsing into meaningful units.
That’s because ‘textual’ content in a pdf is nothing other than a bunch of instructions to draw glyphs inside a rect that represents a page; utilities that come with mupdf or poppler arrange those glyphs (not always perfectly) into ‘blocks’, ‘lines’, and ‘words’ based solely on whitespace separation; the programmer who uses those utilities in an end-user facing application then has to figure out how to create the illusion (so to speak) that the user is selecting/copying/searching for paragraphs, sentences, and so on, in proper reading order.
PyMuPDF comes with a rich collection of convenience functions to make all that less painful; like dehyphenation, eliminating superfluous whitespace, etc. but still, need some further processing to pick out humanly relevant info.
Built-in regex capabilities of Python can suffice for that parsing; but if not, you might want to look into NLTK tools, which apply sophisticated methods to tokenize words & sentences.
EDIT: I really should’ve mentioned some proper full text search tools. Once you have a good plaintext representation of a pdf page, you might want to feed that representation into tools like the following to index them properly for relevant info:
https://lunr.readthedocs.io/en/latest/ – this is easy to use, & set up, esp. in a python project.
… it’s based on principles that are put to use in this full-scale, ‘industrial strength’ full text search engine: https://solr.apache.org/ – it’s a bit of a pain to set up; but python can interface with it through any http client. Once you set up some kind of mapping between search tokens/keywords/tags, the plaintext page, & the actual pdf, you can get from a phrase search, for example, to a bunch of vector graphics (i.e. the pdf) relatively painlessly.
What I can’t quite make sense of, is how ‘James’ itself is a diminuitive of ‘Jacob’.
I believe ‘Harry’ is the Welsh version of English ‘Henry’, & German ‘Heinrich’. … At least that’s the impression I got from Shakespeare’s ‘Henriad’ plays (H. IV 1-2, & H. V)


Another vote for Tesseract – just to clarify the terminology, though: PDF is a fragile format best used read-only; so you really don’t want to edit a pdf, but make a new one using the same (or cleaned-up) bitmaps and a new ocr text layer.
Now, tesseract is excellent at recognizing glyphs; but especially if the scanned image is a little fuzzy, the layout detection falters; and when it falters, you get redundant line breaks, & chunks of text in the wrong order – all of which gets incredibly annoying for searching & copying purposes. So if you can spare the time, and the text requires it, you may need to mark regions (paragraphs & titles mainly) on the bitmap image manually. There exist a few frontends to Tesseract that help with a task like that; check out, e.g., https://github.com/manisandro/gImageReader - inside single paragraph blocks of text, Tesseract doesn’t get as easily confused; and the text output is in the correct reading order, & w/o redundant breaks.
Better cite Wozniak as the one who ‘made’ Apple; but anyway.
I was thinking ‘The Yardbirds’ for the first one; though I wasn’t at all sure, bcs that didn’t take into account the ‘1 … 2 … 3’


Those are Stimpies though. Stimpies of the Ren faire.


Yeah I keep running into similar issues when trying to build pretty much anything on windows; for stuff that can’t be ‘nicely’ configured & dependency-managed through an IDE, windows is pure pain.
It really sounds like PySide would fit your use case better. Check out this website for a great starting point: https://www.pythonguis.com/pyqt6/ – the author also has an entire book on packaging PySide programs for cross-platform distribution.
As for installing Python itself; I think I’d stick with the plain installer from python.org, and afterwards, pip. In case of dependencies that are hard to get through PyPi, I think anaconda might be worth looking at as well: https://www.anaconda.com/download
msys2 provides a package manager, & several development toolchains; it’s an easy way to get native (mingw) gcc & bash on windows; cross-platform programs rely on it heavily, because it saves them from all the ‘visual studio’ BS: https://www.msys2.org/docs/what-is-msys2/ – I believe any implementation of GTK on windows requires a mingw toolchain.


Am I missing something?
It’s impossible to tell without knowing what specific aspect had failed.
Before we even get to GTK; there are some issues with python wheels under msys2; check out: https://www.msys2.org/docs/python/ – some wheels just can’t be built under msys2 due to various incompatibilities. Not being able to replace such packages with ‘pure’ python equivalents could end up being a (very annoying) roadblock.
The roadblock that I recently ran into with my simple GTK4 app was unpredictable ids on d-bus interface exports. D-bus does work under msys2; though you have to start the user session manually; d-feet and gdbus also work; though, as always, there’s a catch. On Linux I can automaticaly export ‘action groups’ that belong to GtkApplicationWindow widgets; & their 'object path’s show up predictably under the application’s path + / + the window’s id. This makes it really convenient when you want to add basic ‘remote controls’ to your widgets. Under msys2, though, I can’t figure out how to find those paths; which throws a monkey wrench, so to speak, in my ‘remote control’ implementation. Granted, d-bus is a linux-native technology; and expecting it to work w/o issues on windows is probably a bit too much.
– apart from those, I haven’t run into any issues with GTK4 under msys2. The GTK3 packages available in their repos also work just fine.
I do agree with the others who recommend PySide, though. Their cross platform support appears to be more robust. Their documentation has been improving as well.
and then try to play Doom on it. https://www.youtube.com/watch?v=D5NTJSfUWDE


Recently I became aware of ‘StarLite’ tablets – the prices are pretty steep, but the specs look really good, esp. wrt the screen.


I was about to quote the same.
… I mean, when you’re this clueless, maybe don’t put out ‘articles’ for others to read – it’s wasting everyone’s time.
I thought the title of this article was intriguing; because in the Linux community certain aspects of the desktop experience do get hyped; & there’s a tendency in general to sweep various usability issues under the rug, with the unwarranted confidence that we’re already “better than everyone else” in every way; though the article doesn’t address any of those.
Not sure what the question is – are you looking to port extensions over yourself, or are you just exclaiming, “it can’t be so hard, so why won’t someone do it!”.
There’s plenty of documentation over at MDN as to writing extensions, writing cross-browser extensions, porting mv2 firefox extensions over to mv3, the differences between Firefox’s mv3 implementation, and that found in Chrome, etc. etc. etc. The following are good starting points: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions & https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Build_a_cross_browser_extension
For ground-level, basic stuff (managing a popup, communicating between popup & a ‘background’ script, between content loaded on the browser & your scripts, managing a context menu, etc.) writing an extension is straightforward once you develop some degree of understanding of the sometimes convoluted paths the data needs to take, the permissions you need to have in order to pass messages through, etc. Larger extensions are full fledged applications in their own right, though, so tackling them introduces difficulties of a different order of magnitude.
The Falkon browser is extensible (in its own way) through QML; and the Nyxt browser is extensible in common lisp. These aren’t ‘webextensions’ in the precise sense of the term, though they could be just as useful. I wrote a basic bookmark manager that I use mainly on Firefox; but I ported its core functionality (just send the current page’s title, url, & selections from the <head> tag over to my database (postgresql via the postgrest http frontend, to which I just make a fetch request)) to QML, and it was pretty straightforward. Falkon is based on Qt’s QtWebEngine, which is Chromium-based; Nyxt is based on WebKit.
edit: There’s also luakit and qutebrowser . The former is extensible via lua 5.1 scripts, the latter, python; there isn’t a wealth of documentation & examples, though (at least there wasn’t last time I checked) so the API can be a bit of a mystery. Luakit as webkit as its engine, qutebrowser is built on QtWebEngine just like Falkon.
I mean, seriously; whoever uses integers in 2023? I mean, 2023.0?
xfwm4 could work w/o Xfce, though I doubt that it would be worth the effort to script the missing bits by hand. Xfce is pretty modular; once you turn off the tracker/indexer, and whatever useless package manager gui the distro may have included (e.g., ‘dnfdragora’), it’s pretty lightweight. You can also turn off the compositor. The stock xfce4-panel is also miles ahead (IMHO) of various independent panel programs, both in functionality, as well as looks – and its widgets are also entirely modular.
labwc is a window manager in the vein of openbox; I guess under wayland a window manager has to be a compositor too (?); but it’s no different from sway in this regard.
There’s also wayfire; which is a bit more beefy, and aims to preserve all the compiz plugins. Some of those are notorious for being silly eye candy (windows that burn down on close; wobbly windows, etc.) but others are pretty useful (esp. those that emulate the exposé view from OS X; pinning/grouping windows, etc.) – though in my experience it isn’t as stable as labwc; which is understandable because it’s a lot more complex.
Though ‘finding’ the UDP packet should cost a lot more, because, whoever knows where it is?