Build a Bluesky Bot: Automate Social Media Posting & Web Scraping with this In-Depth Tutorial

18 min read6 days ago

What if you could automate product discovery and effortlessly deliver new product launch notifications directly to your social media feed — all without spending a single penny?

If you’re familiar with basic programming, building a bot that automates web scraping and social media posting isn’t as hard as you think, especially when you can orchestrate everything using free tools — Jina.ai, Google Gemini, and GitHub Actions.

In this tutorial, we’ll explore building your very own Launchbot — an automated web scraper and social media poster that will change the way you track and share product launch updates. Forget manual social media posting or paying exorbitant fees for automated online services — with a little bit of effort and a sprinkle of code, we’ll create a fully automated system on our own.

I would recommend you have at least some prior programming knowledge in Typescript and Node.js before starting. While you don’t necessarily need to know how to code to build software, programming knowledge will help tremendously, and I can’t emphasise this point enough. Regardless, even for beginners, I’ve made this guide easy to follow along.

Note: You can download the code for the bot here.

An Introduction

Before we begin, let me provide some background…

For months, I’ve been meaning to get my hands dirty with the Bluesky API. A bot seemed like the best way forward. I’ve built basic bots before, but this would take a much more hands-off approach — complete automation with minimal to no human oversight needed once the bot was deployed.

It’s not as terrifying as it sounds, especially since I intended on keeping things very simple.

I hadn’t quite figured out what I wanted the bot to do, what actions to perform, what the end result would look like. I had a basic idea in mind. I wanted the bot to scrape content off the Internet and publish news posts about new advances in AI research and software development — similar to a news feed (like MarkTechPost or MIT News). After some more thought, I realised I would have to spend a lot of time curating the feed, and the scraping logic itself would take too long to develop.

To keep things simple, I landed on a concept that wouldn’t require too much effort to put together. My project goal shifted to simply tracking new product launches. The task seemed simple enough because there are a plethora of websites already publishing this content (like Product Hunt) and all I really had to do was scrape the front page and I would have material to work with.

The workflow for the bot would be simple enough — scrape the landing page, retrieve product names and their respective websites, scrape these websites, summarise details for each product, and then post this to Bluesky. Easy!

The entire process itself — from initial scrape to final social media post — would take no more than a few minutes. Now the only question was what technologies to use.

An overview of the bot’s entire workflow.

The Tools and Tech Stack

(And why I chose what I chose) …

Node.js and TypeScript

For this bot, Node.js and TypeScript are excellent choices for several reasons:

Scalability for Web Scraping and API Interactions: Node.js’s non-blocking nature is ideal for handling web scraping tasks (making requests to multiple websites) and interacting with APIs. It can manage concurrent operations really well, which is crucial for scraping data and posting updates without slowing down.
Strong Typing for Reliability: In a project that involves data processing, API interactions, and posting to social media, reliability is key. TypeScript’s static typing helps us write more robust code, reducing the chances of runtime errors and making the program easier to maintain and debug over time.
Large Ecosystem of Libraries (npm): Node.js has a vast package ecosystem called npm (Node Package Manager). We can easily find and use libraries for web scraping (axios, cheerio, puppeteer), interacting with APIs (@atproto/api for Bluesky), database access (mongodb), and many other tasks. This saves us a lot of time and effort by letting us use existing, well-tested code.

Google Gemini

Gemini was the go-to LLM for a few key reasons:

It’s Free: Google offers its frontier models with a generous free tier. While Gemini is arguably not the best for every use case, for fast, experimental development and smaller-scale projects, it’s a cost-effective option.
HUGE Context Window: Gemini Flash 2.0, the model we will be using, can process a significant amount of text at once — 1 million tokens, to be precise. In comparison, other free models don’t come anywhere close to that. This is an important consideration as the amount of scraped content we will be feeding the model may sometimes be huge!

MongoDB (NoSQL Database)

For this project, MongoDB is a good choice for storing product data because:

Flexibility to Store Product Data: Overtime, the structure of our data might evolve. MongoDB’s flexible schema allows us to store diverse product data without forcing it into a rigid table structure.
Ease of Use with Node.js: Since our program is built with Node.js, using MongoDB is pretty convenient. The MongoDB Node.js driver makes it easy to connect to and interact with the database from our JavaScript/TypeScript code.
It’s Free: MongoDB Atlas comes with a free tier that lets us deploy a single cluster (a group of databases) and store up to 512MB. For this project, that seems more than sufficient.

GitHub Actions

GitHub Actions offer a critical piece of infrastructure for our bot for several reasons:

Automation: We are using GitHub Actions to automate the execution of our scripts. We can set up a scheduled workflow to run regularly to automatically scrape for new products and publish information to Bluesky.
Automate testing (optional, but good practice): We could also use GitHub Actions to automatically run tests whenever we make changes to the code, ensuring that our program is working correctly.
Free for Public Repositories (and often sufficient for private projects): GitHub Actions offers a generous free tier, especially for public repositories. For private projects, there’s also a free tier that is often sufficient for smaller automation tasks.

Architectural Overview: Modules and Data Flow

To keep the project code organized and maintainable, I structured it into modules. Each module is responsible for a specific set of tasks. This modular design follows the Single Responsibility Principle, making the code easier to understand, test, and modify.

Here’s an overview of the file structure with the files we’ll be focusing on highlighted in bold:

├── .github/workflows          # GitHub Workflow configurations
│   ├── scrape.yml             # Workflow for running the scraper script (src/main.ts)
│   └── post.yml               # Workflow for running the poster script (src/post.ts)
├── src                        # Source code directory
│   ├── main.ts                # Entry point for the scraper script
│   ├── post.ts                # Entry point for the poster script
│   ├── modules                # Modular components
│   │   ├── bluesky.ts         # Handles posting product updates to Bluesky
│   │   ├── filter.ts          # Filters out unwanted product listings
│   │   ├── llm.ts             # Uses Gemini AI for parsing and summarization
│   │   ├── scraper.ts         # Handles scraping of product listings and pages
│   │   └── storage.ts         # Manages database interactions (MongoDB)
│   └── utils                  # Utility functions
│   │   └── logger.ts          # Logging utility
├── .env                       # Environment variables (API keys, database URIs, etc.)
├── package-lock.json
├── package.json               # Project dependencies and metadata
├── README.md                  # Project documentation
└── tsconfig.json              # TypeScript configuration

Project Modules Breakdown

Here’s a breakdown of each module and its responsibility:

Web Scraping Module (`src/modules/scraper.ts)`

Purpose: To gather raw data from the web, which is then processed by other modules.

Responsibility: Performs all web scraping activities. It handles fetching HTML content from websites and extracting relevant data.

Functions:

scrapeNewProducts(): Scrapes listings of new products from target websites. This is used in the daily scraping workflow (src/main.ts).
scrapeProductPage(product_url): Scrapes the detailed content of an individual product page given its URL. This is used in the posting workflow (src/post.ts).
getProductPageScreenshot(product_url): Takes a screenshot of a product page. This is also used in the posting workflow (src/post.ts) to include a visual with the Bluesky post.

LLM Module (`src/modules/llm.ts`)

Purpose: To use Gemini to understand and structure the scraped text data, making it usable for our program.

Responsibility: Handles all interactions with the Large Language Model. It’s responsible for parsing unstructured text data using the LLM and generating summaries.

Functions:

parseProducts(rawData): Takes raw scraped data as input and uses the LLM to parse it into a structured format, extracting key product information. Used in src/main.ts.
parseProductPageContent(productPageContent): Takes the HTML content of a product page and uses the LLM to generate a concise summary of the product. Used in src/post.ts.

Data Storage Module (`src/modules/storage.ts)`

Purpose: To persist product data, track which products have been posted, and manage the database connection.

Responsibility: Manages all interactions with our data storage, which is MongoDB Atlas in this case. It handles saving, retrieving, and updating product data in the database.

Functions:

getProducts(): Retrieves the MongoDB collection to allow for querying the database directly. Used in src/post.ts to find unposted products.
saveProducts(products): Saves a list of product objects to the MongoDB database. It likely handles checking for duplicates and appending new products. Used in src/main.ts.
markProductAsPosted(product_url): Updates a product in the database, marking it as "posted" (by setting the posted field to true). Used in src/post.ts.
closeDatabaseConnection(): Closes the connection to the MongoDB database. It's important to close connections to free up resources. Used in both src/main.ts and src/post.ts in the finally blocks to ensure connection closure even if errors occur.

Product Filtering Module (`src/modules/filter.ts)`

Purpose: To refine the list of discovered products, removing any that are not relevant or desired. This helps to ensure the quality of products posted to Bluesky.

Responsibility: Implements logic to filter products based on certain criteria. In the provided src/main.ts, it's used to filter out products from specific domains.

Functions:

filterProducts(parsedProducts): Takes a list of parsed product objects and applies filtering rules. It returns both the filteredProducts (products that passed the filters) and removedProducts (products that were filtered out). Used in src/main.ts.

Bluesky API Module (`src/modules/bluesky.ts)`

Purpose: To automate posting to the Bluesky social media platform.

Responsibility: This module handles all interactions with the Bluesky API.

Functions:

postToBluesky(product_title, product_summary, product_url, product_screenshot): Takes product information (title, summary, URL, screenshot) and uses the Bluesky API to create a new post on Bluesky. Used in src/post.ts.

Logging Utility Module (`src/utils/logger.ts)`

Purpose: To aid in monitoring, debugging, and understanding the program’s behavior over time. Logging is crucial for identifying and resolving issues, especially in automated systems.

Responsibility: Provides a simple logging function to record events and errors during program execution.

Functions:

writeLog(message): Writes a log message to a log file or console (depending on implementation). Used throughout both src/main.ts and src/post.ts to track the program's progress and any issues.

GitHub Workflows

Scrape Workflow (`scrape.yml`)

Script: src/main.ts

Schedule: Daily at midnight UTC

Purpose:

Scrapes new product listings from predefined sources.
Parses raw data using Gemini to extract product details.
Filters out products from unwanted domains.
Saves filtered and parsed product data to a MongoDB database.

Post Workflow (`post.yml`)

Script: src/post.ts

Schedule: Every 6 hours starting at 12:30AM UTC

Purpose:

Retrieves unprocessed product listings from the MongoDB database.
Scrapes individual product pages for detailed content.
Generates a concise product summary using Gemini.
Posts the product summary to a Bluesky social media account, including a screenshot of the product page.
Updates the product status in the database to mark it as “posted”.

Getting Started: Setting Up Your Project

This section will guide you through setting up the necessary accounts, API keys, and environment configurations to run the bot.

First, the basics.

Install Node and npm

Download and install Node.js from https://nodejs.org/. This will also install npm (Node Package Manager), which we’ll use to manage project dependencies. Choose the LTS (Long-Term Support) version for stability.
After installation, verify that Node.js and npm are installed correctly by running the following commands in your terminal. You should see version numbers for both Node.js and npm.

node -v
npm -v

Building the Scraper with Jina AI

We are using Jina AI to simplify web scraping.

Jina AI is a tool that helps developers build smart search systems using AI.

Jina’s Reader API lets us retrieve website content in various formats, including Markdown, which is ideal for our project as it removes HTML clutter and reduces the text volume for LLM processing. There are also some more hidden benefits to using their service.

No API Key Required: One key advantage here is that it does not require an account or API key for basic usage.
Content Format: Markdown: We are requesting content in Markdown format by setting the X-Return-Format: markdown header in our requests. This simplifies data extraction and processing in later steps.

// Example from src/modules/scraper.ts
const response = await axios.get(url, {
  headers: {
    "X-No-Cache": "true",
    "X-Return-Format": "markdown", // Request Markdown format
  },
});

Bypassing Cache: To ensure we get fresh product listings each time we scrape, we are using the X-No-Cache: true header. This tells Jina AI's service to bypass its cache and fetch the latest content from the target website.

// Example from src/modules/scraper.ts
headers: {
  "X-No-Cache": "true", // Bypass cache for fresh content
  "X-Return-Format": "markdown",
},

Plugging into Gemini API

To parse the scraped product listings and summarize product page content, we are using the Google Gemini API.

Here’s how to set it up:

Get a Gemini API Key:

Go to Google AI Studio.
Sign in with your Google account and create a new project if you haven’t already.
In your project, navigate to “API keys” in the sidebar and click on “Create API key”.
Copy the generated API key. This is your Gemini API key. Treat this key like a password and keep it secure.

In case you aren’t able to create an API key, you may have to create a new project in https://console.cloud.google.com/

Pro-tip: In case you don’t want to use Google, Openrouter offers a huge selection of models that you can access with a single API key.

Install the Gemini API Client Library:

In your project directory, open your terminal and run the following command to install the @google/generative-ai npm package:

npm install @google/generative-ai

This library simplifies interacting with the Gemini API from our Node.js application.

Set up an Environment Variable:

Create a .env file in the root of your project directory if you don't already have one.
Add the following line to your .env file, replacing YOUR_GEMINI_API_KEY with the API key you copied from Google AI Studio without quotes:

GEMINI_API_KEY=YOUR_GEMINI_API_KEY

Using the Gemini API in src/modules/llm.ts:

The src/modules/llm.ts file demonstrates how to use the @google/generative-ai library and your API key to interact with the Gemini API.

Example of initializing Gemini API client:

// src/modules/llm.ts
import { GoogleGenerativeAI } from "@google/generative-ai";
import dotenv from 'dotenv';
dotenv.config();

const geminiApiKey = process.env.GEMINI_API_KEY;

if (!geminiApiKey) {
  throw new Error("GEMINI_API_KEY is not set in environment variables.");
}

const genAI = new GoogleGenerativeAI(geminiApiKey);
const model = genAI. GenerativeModel({ model: "gemini-2.0-flash"});

Building Prompts: The parseProducts() and parseProductPageContent() functions in src/modules/llm.ts construct prompts to send to the Gemini API.

parseProducts() Prompt: This prompt is designed to extract product titles and URLs from the scraped Markdown content and return the result in JSON format. The generationConfig is set to encourage JSON output.

// Example from src/modules/llm.ts (parseProducts)
const generationConfigForProducts = {
  response_format: "json_object", // Encourage JSON output
};

const promptForProducts = `You are a product listing parser. ...`; // Detailed prompt

const chatSession = model.startChat({
    generationConfig,
    history: [],
  });

const result = await chatSession.sendMessage(promptForProducts);
const responseText = result.response.text();

parseProductPageContent() Prompt: This prompt is for generating a concise summary of a product page. It's designed to return regular text.

// Example from src/modules/llm.ts (parseProductPageContent)
const promptForSummary = `Summarize the following product page content ...`; // Detailed prompt

const result = await chatSession.sendMessage(promptForSummary);
let responseText = result.response.text().trim();

Creating a Database with MongoDB Atlas

We use MongoDB Atlas to store the scraped product data. Here’s how to set up a MongoDB Atlas cluster:

Create a MongoDB Atlas Account:

Go to MongoDB Atlas.
Click on “Try Free”.
Sign up for an account using your email or Google account.

Create a Free Tier Cluster:

Once logged in, you’ll be guided to create a new project and cluster.
Choose the “Free” option.
Select a cloud provider (e.g., AWS, Google Cloud, Azure) and a region. The default recommendations are usually fine for the free tier.
Click “Create Cluster”. It will take a few minutes for the cluster to be provisioned.

Click on the Free Tier on the right. They only offer a single free cluster per account, but you can create multiple databases as part of the same cluster.

Set up Database User and Password:

After the cluster is created, navigate to “Database Access” in the sidebar.
Click “Add New Database User”.
Choose a username and set a secure password.
For “User Privileges”, “Read and write to any database” is usually sufficient for this project.
Click “Add User”. Remember the username and password you set.

Click the edit button on the right to configure your database password.

Configure Network Access:

Navigate to “Network Access” in the sidebar.
Click “Add IP Address”.
Select “Allow access from anywhere”. This sets the IP address to 0.0.0.0/0, allowing connections from any IP address. This is necessary for GitHub to access the database. For a basic tutorial and personal projects, this is acceptable for simplicity. However, for production applications, it's crucial to restrict network access to specific IP addresses for security.
Click “Confirm”.

Get the Connection URI:

Go to “Database” in the sidebar.
Click “Connect” for your cluster.
Choose “Drivers” as the connection method.
Select “Node.js” as the driver and keep the default version.
Copy the Connection String. It will look something like:

mongodb+srv://<username>:<password>@<your-cluster-url>/<your-database-name>?retryWrites=true&w=majority

Replace <username> with the username and <password> with the password you created in step 3.
Note the database name in the connection string (e.g., <your-database-name>). We'll use this later.

Click the “Connect” button to see the URI.

Set up MongoDB Environment Variables:

In your project’s .env file, add the following lines, replacing the placeholders with your actual MongoDB connection details:

MONGODB_URI="YOUR_MONGODB_CONNECTION_STRING" # Paste the connection string here
MONGODB_DBNAME="your-database-name"      # Use the database name from the connection string
MONGODB_COLLNAME="products"              # Or choose a different collection name

Important: Ensure that the MONGODB_URI is correctly formatted and includes your username and password. The MONGODB_DBNAME should match the database name in your connection string, and MONGODB_COLLNAME is the name you want to give to your MongoDB collection (e.g., "products").

Install MongoDB Node.js Driver:

If you haven’t already, install the mongodb npm package in your project directory:

npm install mongodb

The src/modules/storage.ts file uses this driver to connect to and interact with your MongoDB Atlas database.

Plugging into the Bluesky API

To enable our bot to post product discoveries to Bluesky, we need to set up a Bluesky account and configure the API credentials. Here’s how:

Create a Bluesky Account:

If you don’t already have one, go to https://bsky.app and sign up for a Bluesky account.

Set up an App Password (Recommended for Security):

Instead of using your main Bluesky account password directly in the bot (which is less secure), it’s highly recommended to create an App Password. App passwords are specifically generated for applications and can be revoked without affecting your main account password.
Copy the generated app password and keep it in a safe place. You will use this app password instead of your main account password for the bot.

Set up Bluesky Environment Variables:

In your project’s .env file, add the following lines, replacing the placeholders with your Bluesky account details:

BLUESKY_SERVICE="https://bsky.social"  # Default Bluesky service URL
BLUESKY_IDENTIFIER="your-bluesky-username-or-email" # Your Bluesky username or email
BLUESKY_PASSWORD="your-bluesky-app-password" # The App Password you just created

Important:
BLUESKY_SERVICE: For most users, "<https://bsky.social>" is the correct Bluesky service URL. You generally don't need to change this unless you are using a different Bluesky endpoint.
BLUESKY_IDENTIFIER: This should be your Bluesky username (handle, e.g., yourusername.bsky.social) or the email address you used to sign up for Bluesky.
BLUESKY_PASSWORD: Use the App Password you generated in step 2, not your main Bluesky account password.

Install the Bluesky API Client Library:

If you haven’t already, install the @atproto/api npm package in your project directory:

npm install @atproto/api

The src/modules/bluesky.ts file uses this library to interact with the Bluesky API.

Using the Bluesky API in src/modules/bluesky.ts:

The src/modules/bluesky.ts file handles authentication and posting to Bluesky.
Example of initializing AtpAgent and logging in:

// src/modules/bluesky.ts
import { AtpAgent } from "@atproto/api";
import dotenv from 'dotenv';
dotenv.config();

// Read service and credentials from environment variables
const service = process.env.BLUESKY_SERVICE;
const identifier = process.env.BLUESKY_IDENTIFIER;
const password = process.env.BLUESKY_PASSWORD;

if (!service) {
  throw new Error("BLUESKY_SERVICE environment variable is not defined");
}
const agent = new AtpAgent({ service });

async function login(): Promise<void> {
  if (!agent.session) {
    if (!identifier || !password) {
      throw new Error("BLUESKY_IDENTIFIER or BLUESKY_PASSWORD environment variable is not defined");
    }
    await agent.login({ identifier, password });
    writeLog("Logged in to Bluesky successfully.");
  }
}

Posting to Bluesky: The postToBluesky() function in src/modules/bluesky.ts demonstrates how to:

Log in to Bluesky using the login() function.
Compose the post text, including product title and summary.
Use RichText from @atproto/api to handle text formatting and facet detection (for links and mentions, though not explicitly used in this basic example).
Convert the product page screenshot Data URI to a Uint8Array for uploading.
Upload the screenshot as a blob using agent.uploadBlob().
Construct an “external embed” (website card) using app.bsky.embed.external to include a link to the product page and the screenshot.
Create a post record using agent.post() with the text, facets, and embed.

Pro-tip: Check out the official Bluesky API documentation to find more information about their service.

Creating the Scripts (`main.ts` and `post.ts`)

You’ve already been provided with the code for src/main.ts and src/post.ts. These scripts are designed as separate workflows for daily scraping and product posting, respectively.

src/main.ts (Daily Scraping Script): This script is responsible for:

Scraping new product listings using scrapeNewProducts() from scraper.ts.
Parsing the scraped data with Gemini using parseProducts() from llm.ts.
Filtering products using filterProducts() from filter.ts.
Saving new products to MongoDB using saveProducts() from storage.ts.
Logging progress and errors using writeLog() from utils/logger.ts.

src/post.ts (Posting Workflow Script): This script handles posting products to Bluesky:

Retrieving an unposted product from MongoDB using getProducts() from storage.ts.
Scraping the product page and taking a screenshot using functions from scraper.ts.
Generating a product summary with Gemini using parseProductPageContent() from llm.ts.
Posting to Bluesky using postToBluesky() from bluesky.ts.
Marking the product as posted in MongoDB using markProductAsPosted() from storage.ts.
Logging and error handling.

Creating Workflows for GitHub Actions

Time for some GitHub action. To automate the daily scraping and posting, you need to set up two workflow files in your repository: scrape.yml and post.yml inside the .github/workflows directory.

Create Workflow Files:

In your project repository, create a directory named .github at the root level.

Inside .github, create another directory named workflows.
Create two YAML files within .github/workflows: scrape.yml and post.yml.

scrape.yml (Daily Scraping Workflow):

This workflow will automate the execution of src/main.ts daily.

name: Daily Scraper Run

on:
  schedule:
    - cron: '0 0 * * *'  # Runs daily at midnight UTC
  workflow_dispatch:  # Also allow manual trigger

jobs:
  run-scraper:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Code
        uses: actions/checkout@v3

      - name: Set Up Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'
  
      - name: Install Dependencies
        run: npm install
      
      - name: Run Build
        run: npm run build
        
      - name: Run Scraper
        run: npm run scrape
        env: 
          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
          MONGODB_DBNAME: ${{ secrets.MONGODB_DBNAME }}
          MONGODB_COLLNAME: ${{ secrets.MONGODB_COLLNAME }}
          MONGODB_URI: ${{ secrets.MONGODB_URI }}

post.yml (Posting Workflow):

name: Product Poster Run

on:
  schedule:
    - cron: '30 */6 * * *'  # Runs every 6 hours
  workflow_dispatch:  # Also allow manual trigger

jobs:
  run-poster:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Code
        uses: actions/checkout@v3

      - name: Set Up Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'
  
      - name: Install Dependencies
        run: npm install

      - name: Run Build
        run: npm run build

      - name: Run Poster
        run: npm run post
        env:
          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
          MONGODB_DBNAME: ${{ secrets.MONGODB_DBNAME }}
          MONGODB_COLLNAME: ${{ secrets.MONGODB_COLLNAME }}
          MONGODB_URI: ${{ secrets.MONGODB_URI }}
          BLUESKY_SERVICE: ${{ secrets.BLUESKY_SERVICE }}
          BLUESKY_IDENTIFIER: ${{ secrets.BLUESKY_IDENTIFIER }}
          BLUESKY_PASSWORD: ${{ secrets.BLUESKY_PASSWORD }}

Set up a GitHub Repository:

Create a new repository on GitHub for your project.
Push your project code to the repository.

Set up GitHub Actions Secrets:

Go to your GitHub repository.
Navigate to “Settings” > “Secrets and variables” > “Actions”.
Click “New repository secret”.

Click on Settings → Actions → New repository secret

Create the following repository secrets, using the exact names as used in your workflow files (specifically post.yml in the env: section):

GEMINI_API_KEY: Paste your Gemini API key as the value.
MONGODB_URI: Paste your MongoDB connection string as the value.
MONGODB_DBNAME: The MongoDB database name (e.g., "product-database").
MONGODB_COLLNAME: Enter your MongoDB collection name (e.g., "products").
BLUESKY_SERVICE: https://bsky.social (unless you’re using a different service).
BLUESKY_IDENTIFIER: yourusername.bsky.social
BLUESKY_PASSWORD: Enter your Bluesky account password.

Important: The secret names in GitHub Actions must exactly match the environment variable names expected in your code and workflow files.

Manually Trigger Workflows (Initial Test):

Go to the “Actions” tab in your GitHub repository.
You should see the “Daily Product Scrape” and “Post to Bluesky” workflows listed.
Click on a workflow name (Daily Scraper Run or Product Poster Run if you haven’t changed the names of workflows in the .yml files shared earlier) and then click “Run workflow” on the right to manually trigger them.
Monitor the workflow runs in the “Actions” tab to see if they complete successfully. Check the logs for any errors.
If the workflows run without any errors, that means they will run fine on their own. You don’t need to do anything more for them to fire automatically.

Click on a workflow on the left sidebar and click on the “Run workflow” button on the right

Conclusion and Next Steps

Congratulations! You should now have a Bluesky bot ready and at your disposal. I hope I’ve covered everything and haven’t missed a step. If I have, please do let me know in the comments.

Here’s the bot in action…

I hope seeing your own bot in action is as satisfying as it was seeing mine make its first automated post!

You can customise the bot to your liking by changing the scraping targets, refine the LLM prompts, extend the posting functionality, and add more features.

Some features I have in mind for future versions include:

Product Categorization: Use the LLM to categorize products into different categories and add category tags to Bluesky posts.
Sentiment Analysis: Use the LLM to analyze product descriptions or reviews and include sentiment information in the Bluesky posts (e.g., “This product is getting great reviews!”).
User Interface: Build a simple web interface or command-line interface to manage the bot’s settings, view scraped products, or manually trigger posting.

Build a Bluesky Bot: Automate Social Media Posting & Web Scraping with this In-Depth Tutorial

An Introduction

The Tools and Tech Stack

Node.js and TypeScript

Google Gemini

MongoDB (NoSQL Database)

GitHub Actions

Architectural Overview: Modules and Data Flow

Project Modules Breakdown

Web Scraping Module (`src/modules/scraper.ts)`

LLM Module (`src/modules/llm.ts`)

Data Storage Module (`src/modules/storage.ts)`

Product Filtering Module (`src/modules/filter.ts)`

Bluesky API Module (`src/modules/bluesky.ts)`

Logging Utility Module (`src/utils/logger.ts)`

GitHub Workflows

Scrape Workflow (`scrape.yml`)

Post Workflow (`post.yml`)

Getting Started: Setting Up Your Project

Install Node and npm

Building the Scraper with Jina AI

Plugging into Gemini API

Creating a Database with MongoDB Atlas

Plugging into the Bluesky API

Creating the Scripts (`main.ts` and `post.ts`)

Creating Workflows for GitHub Actions

Conclusion and Next Steps

Here’s the bot in action…

Further Reading

Written by Aalap Davjekar

Responses (1)

Build a Bluesky Bot: Automate Social Media Posting & Web Scraping with this In-Depth Tutorial

An Introduction

The Tools and Tech Stack

Node.js and TypeScript

Google Gemini

MongoDB (NoSQL Database)

GitHub Actions

Architectural Overview: Modules and Data Flow

Project Modules Breakdown

Web Scraping Module (src/modules/scraper.ts)

LLM Module (src/modules/llm.ts)

Data Storage Module (src/modules/storage.ts)

Product Filtering Module (src/modules/filter.ts)

Bluesky API Module (src/modules/bluesky.ts)

Logging Utility Module (src/utils/logger.ts)

GitHub Workflows

Scrape Workflow (scrape.yml)

Post Workflow (post.yml)

Getting Started: Setting Up Your Project

Install Node and npm

Building the Scraper with Jina AI

Plugging into Gemini API

Creating a Database with MongoDB Atlas

Plugging into the Bluesky API

Creating the Scripts (main.ts and post.ts)

Creating Workflows for GitHub Actions

Conclusion and Next Steps

Here’s the bot in action…

Further Reading

Written by Aalap Davjekar

Responses (1)

Web Scraping Module (`src/modules/scraper.ts)`

LLM Module (`src/modules/llm.ts`)

Data Storage Module (`src/modules/storage.ts)`

Product Filtering Module (`src/modules/filter.ts)`

Bluesky API Module (`src/modules/bluesky.ts)`

Logging Utility Module (`src/utils/logger.ts)`

Scrape Workflow (`scrape.yml`)

Post Workflow (`post.yml`)

Creating the Scripts (`main.ts` and `post.ts`)