Because I can. Because it's amusing. And because it might actually save me (and perhaps someone else?) time, by helping to decide what to read.
Every day at 16:00 UTC, the script gets the top 30 HN posts (with the help of HN API), then downloads those that point to HTML pages, cleans them up (with Crux), and then for each of them, tells GPT-3: Summarize the following article in one sentence: [article]. That's it, really. To keep the request/response sizes within API limits, I trim the article at sentence boundary at around 3000 characters.
So it's only updated once every 24 hours? Why not more often?
My free GPT-3 allowance has ended and I'm now on a paid plan. The costs can quickly add up: with the current settings, a single run can cost as much as $1.53, although in practice it's more like $0.70. And I'm not monetising this in any way.
OpenAI's rules don't allow fully automated summarization, so there needs to be a human in the loop. I need to review the results daily and potentially remove them if there's a violation of the OpenAI content policy. That's a condition on which this site was approved by OpenAI, and it imposes a time obligation on me.
Before the site went live, I generated updates once every 2 hours. Those are still available.
As a human in the loop, will you be editing the summaries?
No. I see value in having the summaries fully authored by GPT-3: it helps the audience learn what the model can and can't do. But I will occasionally remove summaries that I deem inappropriate, although this should be rare.
Why are some articles not summarised or have nonsensical summaries?
Possible reasons include: the site is not amenable to scraping; the Crux heuristics that detects the “meaty” content of the page yields wrong results; or maybe GPT-3 just had a hiccup. Failures in scraping or summarization are not retried. Also, HN items without URLs (such as most Ask HNs) are not summarised.
What model settings are you using?
The text-davinci-002 model with temperature 0.4 and up to 100 tokens in response. See the code for details.