Comment on Daily Discussion Thread: đ¸đ˝ď¸đĽ Saturday, 11 January, 2025
indisin@aussie.zone â¨3⊠â¨weeks⊠agoIâm badgering GPT for python code to fulfill a niche but simple purpose
Curious. Whatâs the purpose?
Also agreed that it is fun seeing a night time crew around. I miss the check ins!
Baku@aussie.zone â¨3⊠â¨weeks⊠ago
Honestly it probably already exists, but I just want something that does exactly what I want and no more, and is easy to setup:
Iâve got a browser extension that extracts all URLs on a webpage and merged them all into a JSON file. I use them for archiving mass amounts of URLs onto the Wayback machine using another utility they have that archives all URLs from a google sheet.
Google sheets doesnât allow importing a JSON file, though. So the python script takes a bunch of little JSON files with a few hundred URLs, then converts that into a CSV file that I can just import into GSheets. Itâs like 60 lines of code, with a few extra bells and whistles added in for error handling. Very simple
indisin@aussie.zone â¨3⊠â¨weeks⊠ago
Again just curious, but why go via sheets and not their API? Are you crawling (you mentioned a browser extension so guessing not)? You might have an unnecessary complexity if Iâm picturing this correctly at this hour.
Baku@aussie.zone â¨3⊠â¨weeks⊠ago
This came out longer than tnicipated and I'm a bit too smooth brained at the moment to remove all the guff and rephrase. Sorry. Not a rant! Just a Livestream of consciousness basically
I couldnât figure out how to work their API. I got an API key and all that, but things just werenât working Thereâs a set of save page now utilities, I could use API free, but theyâre all Linux shell scripts, I couldnât figure out how to work on windows without messing around with WSL (a bit beyond my capabilities). When I tried to work them out on my MacBook, they worked but from memory not how I wanted I also found the IAs documentation to be missing, difficult to find, or outdated in a lot of areas, as well, which meant that when I last tried to get GPT to work it out, it was trying to use deprecated API calls and an outdated authentication method, and I couldnât make it work much better myself Could probably give it another go. Having it take the URLs from the CSV could work. But anything before that (like crawling) doesnât work the best because some of the things I archive require manual intervention anyway to properly extract all URLs (for instance Lemmy threads start auto collapsing after 300 comments, so they need to be expanded to retrieve comment links), or photos hidden in spoilers need to be expanded to retrieve the image url. That sort of thing. Possible to automate, but it would probably take more time to automate than if save compared to just doing it manually I did actually attempt to get GPT to make a crawler for a completely different purpose once, and it didnât work. I donât remember what exactly went wrong with it, but from memory it was misinterpreting status codes and couldnât parse the html properly. Easier to just fork somebody elseâs crawler and modify it to work with the other scripts I guess Also, importing it into a sheet doesnât actually take that much work. Itâs basically 3 mouse clicks, then heading to the IAs sheet batch archiving page and pasting in the URL. Their batch processing is a bit inefficient and can take a few days, which if done through the API could definitely be done faster and with a bit of smart logic put in to avoid going over daily archive caps and with a queueing system, but those few days donât require any active energy on my part. They keep processing it tin the background at a rate of a row or 2 a minute, then send me an email once itâs done
indisin@aussie.zone â¨3⊠â¨weeks⊠ago
Those clicks require effort though and the dev in me would not be clicking anything.
But the dev and mentor in me is now desperately wanting to jump on a call with you to pair and so Iâm closing this tab.