How does the Official Node.js News Feeder work?
Written by Ulises Gascón
Jul 06, 2023 — 8 min readNode.js has a new RSS feed that consolidate all the releases and news from the different teams, working groups and projects inside the org.
The Challenge
Node.js as an organization has too many things ongoing all the time. There are many projects, teams, and working groups working on different things. It is hard to keep track of all the things that are happening, so there is a recurrent need for the community to find a better way to be aware of what is going on. This discussion has been going on for a while, and there are many ideas on how to solve this problem, but we decided that RSS is a good way to start as this will also help to promote our activities and achievements outside the Node.js org itself.
Requirements
- The teams and working groups should be able to add their own news without having to change their way of working (no PRs, forms...)
- The information should be available in a valid RSS feed.
- The feed should be updated automatically, but allow for manual news additions and easy content curation.
Decisions made
- Use GitHub as the source of truth for the news, so we will use the GitHub API to fetch the relevant information from issues, discussions, releases...
- Use a GitHub Action to generate the RSS feed and publish it to GitHub Pages.
- Use a GitHub Action to update the feed automatically every week or manually when needed, generating a new commit with the changes in a PR that will be reviewed and curated by the team.
- Avoid external dependencies as much as possible, so the solution should be self-contained and easy to maintain.
The Solution
The full source code can be found in this repository. I will explain the most relevant parts of the solution here.
The Architecture
In general terms, the solution is composed of the following parts:
Community
The community is the source of the news. They are the ones who reply to specific issues or discussions related to the news feed, as well as manage the new releases.
Curators
The curators are the ones who review the changes and merge the PRs that update the feed. The feed is automatically updated every week, but it can also be updated manually when needed. There are several scripts in order to collect, process, validate, and publish the feed.
Readers
The readers are the ones who consume the feed. They can be humans or bots. The readers can subscribe to the feed using the following URL: https://nodejs.github.io/nodejs-news-feeder/feed.xml
. We provide a Slack channel where the feed is automatically published, so the community can be aware of the latest news.
The Structure
Configuration
There is a config.json file that stores all the references to the external resources (discussions, issues, releases...), API rate limits, and the configuration of the last execution time (lastCheckTimestamp
).
The last execution time (lastCheckTimestamp
) will prevent us from including already processed information in the feed. This prevents us from using third-party software or reconciling the feed to avoid duplications.
{
"lastCheckTimestamp": 1688584036809,
"reposPaginationLimit": 250,
"releasePaginationLimit": 10,
"commentsPaginationLimit": 100,
"breakDelimiter": "</image>",
"discussionsInScope": [],
"issuesInScope": []
}
Modularity
The solution is divided into different scripts that do different things, which allows us to reuse the code and make it easier to maintain.
This structure is clearer by checking the package.json.
{
"scripts": {
"collect:releases": "node scripts/collect-releases.js",
"collect:issues": "node scripts/collect-issues.js",
"collect:discussions": "node scripts/collect-discussions.js",
"rss:validate": "node scripts/validate.js",
"rss:build": "node scripts/build.js",
"rss:format": "node scripts/format.js",
"rss:format-check": "node scripts/format-check.js"
}
}
Fetching content from Github
Releases
Node.js uses Github Releases to publish new versions of different projects. There are many projects in the organization, and we keep adding more on a regular basis.
So, this script will do the following:
- Fetch all the repositories in the organization.
- Fetch the latest releases for each repository.
- Filter the releases by the ones that are newer than the last execution time (
lastCheckTimestamp
). - Format the releases to be included in the feed.
- Add the releases to the feed.
Issues
Each project is publishing its news in GitHub Issues as responses.
So, this script:
- Fetches all the comments in the issues that are in scope.
- Filters the comments by the ones that are newer than the last execution time (
lastCheckTimestamp
). - Formats the comments to be included in the feed.
- Adds the comments to the feed.
Discussions
Discussions are very similar to issues, but they are not supported in the GitHub API REST, so we used the GitHub GraphQL API to fetch the comments.
const comments = await Promise.all(discussionsInScope.map(async ({ discussionId, team }) => {
const { repository } = await graphql(
`
{
repository(name: "node", owner: "nodejs") {
discussion(number: ${discussionId}) {
comments(last: 100) {
edges {
node {
body
publishedAt
updatedAt
databaseId
}
}
}
}
}
}
`,
{
headers: {
authorization: `token ${process.env.GITHUB_TOKEN}`
}
}
)
return repository.discussion.comments.edges
.filter(comment => new Date(comment.node.publishedAt).getTime() > lastCheckTimestamp)
.map(comment => ({ ...comment.node, team, discussionId }))
}))
See the full file for more details
Updating the feed
In order to update the feed we need to split the current feed by a breakDelimiter
that is defined in the config.json file.
//...OMITED...
const feedContent = getFeedContent()
const [before, after] = feedContent.split(breakDelimiter)
const updatedFeedContent = `${before}${breakDelimiter}${relevantReleases}${after}`
overwriteFeedContent(updatedFeedContent)
See the full file for more details
Formatting the feed
We use the library xml-formatter to normalize the feed content. This will help us curate the content later on when reviewing the PR.
import xmlFormat from 'xml-formatter'
import { getFeedContent, overwriteFeedContent } from '../utils/index.js'
const xml = getFeedContent()
const formattedXml = xmlFormat(xml, { indentation: ' ', collapseContent: true })
overwriteFeedContent(formattedXml)
See the full file for more details
Validate the feed
In order to validate the feed, we directly use the W3C Feed Validation Service with an HTTP Request, simulating the form (using the got library) and parsing the response.
const data = await got.post('https://validator.w3.org/feed/check.cgi', {
form: {
rawdata: xml,
manual: 1
}
}).text()
// Avoid importing CSS in the document
const dom = new JSDOM(data.replace(/@import.*/gm, ''))
const title = dom.window.document.querySelector('h2').textContent
const recommendations = dom.window.document.querySelector('ul').textContent
console.log(recommendations)
if (title === 'Sorry') {
console.log('🚨 Feed is invalid!')
process.exit(1)
} else {
console.log('✅ Feed is valid!')
}
Note: In order to use the library jsdom to scrape the HTML response we need to avoid the @import
statements in the CSS.
The Github Action
Cron Job and Manual Trigger
The GitHub Action is configured to run every week, but it can be triggered manually by using the workflow_dispatch
event. This is useful when we want to update the feed manually, for example when we want to add a new news that is not available on GitHub or just want to promote some news quickly.
on:
workflow_dispatch:
schedule:
- cron: '0 0 * * 0'
# ...OMITED...
See the full file for more details
API Limits
The GitHub API has a limit on requests. This process makes many requests to the API, so the best way to overcome this limitation is by using a GitHub Token.
This token can be created by a user and then added to the repository secrets. The GitHub Action will use this token to authenticate the requests to the API, and it will have a higher limit than the anonymous requests.
But the best solution is to use the already available tokens in the GitHub Actions as follows:
# ...OMITED...
permissions:
contents: write
pull-requests: write
issues: read
packages: none
jobs:
build:
runs-on: ubuntu-latest
steps:
# ...OMITED...
- name: Collect Releases
run: npm run collect:releases
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# ...OMITED...
See the full file for more details
We are passing the secrets.GITHUB_TOKEN
as an environment variable GITHUB_TOKEN
to the scripts.
Slack Notifications
The feed is published on Slack using the RSS App. This app is listening to the feed and pushing the new items to specific channel(s). In our case, we are using the channel #nodejs-news-feed
.
Acknowledgment
Thanks a lot to the Node.js Next 10 team for the support and feedback on this project, especially to Michael Dawson for the guidelines, reviews, and suggestions.