What is this?
Hacker News, but with one-sentence summaries automatically generated by GPT-3.
Because I can. Because it's amusing. And because it might actually save me (and perhaps someone else?) time, by helping to decide what to read.
Every day at 16:00 UTC, the script gets the top 30 HN posts (with the help of HN API), then downloads those that point to HTML pages, cleans them up (with Crux), and then for each of them, tells GPT-3: Summarize the following article in one sentence: [article]. That's it, really. To keep the request/response sizes within API limits, I trim the article at sentence boundary at around 3000 characters.
So it's only updated once every 24 hours? Why not more often?
Couple reasons.
  • My free GPT-3 allowance has ended and I'm now on a paid plan. The costs can quickly add up: with the current settings, a single run can cost as much as $1.53, although in practice it's more like $0.70. And I'm not monetising this in any way.
  • OpenAI's rules don't allow fully automated summarization, so there needs to be a human in the loop. I need to review the results daily and potentially remove them if there's a violation of the OpenAI content policy. That's a condition on which this site was approved by OpenAI, and it imposes a time obligation on me.
Before the site went live, I generated updates once every 2 hours. Those are still available.
But I want more frequent updates!
You can always run it yourself! The source code is available on GitHub.
As a human in the loop, will you be editing the summaries?
No. I see value in having the summaries fully authored by GPT-3: it helps the audience learn what the model can and can't do. But I will occasionally remove summaries that I deem inappropriate, although this should be rare.
Why are some articles not summarised or have nonsensical summaries?
Possible reasons include: the site is not amenable to scraping; the Crux heuristics that detects the “meaty” content of the page yields wrong results; or maybe GPT-3 just had a hiccup. Failures in scraping or summarization are not retried. Also, HN items without URLs (such as most Ask HNs) are not summarised.
What model settings are you using?
The text-davinci-002 model with temperature 0.4 and up to 100 tokens in response. See the code for details.
I'm offended by one of the summaries!
Let me know by email.
Do you intend to run this indefinitely?
No. I'm treating this as an experiment. I intend to run it throughout August 2022, after which there will be no more updates – but I'll keep the already generated summaries indefinitely.
What's the tech stack?
A Clojure script gets periodically run by Cron and generates static HTMLs. I cache the articles as well as output from GPT-3, so hopefully I'm not going to go bankrupt.
What's your OpenAI bill?
Dunno yet. I've capped my bill at $20, so if it stops working it's probably because I hit the limit.
Daniel Janus.