Seeding Frenzy and Migration Moment

05 May, 2024

Part 1. Seeding

Background

Database seeding is a really cool name for a simple task. It's basically adding dummy data to our database with a script. We can add hundreds or even thousands of table rows with this script, making manual testing interactions with the database that we have much easier and faster, as we don't need to add table rows manually.

How to Seed?

Let's say we want to seed events data. We will make this structure:

// events.seed.ts
// Reference: https://gist.github.com/ironbyte/6f6e2e8d5f25c4188ce32fb4dcf3a115

// get the file name to check whether the seeder file has been run
// reference from https://stackoverflow.com/a/43192292
const fileName = __filename.substring(
  __dirname.length + 1,
  __filename.lastIndexOf('.'),
);

// instantiate Drizzle client
const client = postgres(process.env.DATABASE_URL || '');

const db = drizzle(client, {
  schema,
  logger: true,
});

const generateEvents = async (number: number) => {
  const eventRows: Event[] = [];
  const eventTicketRows: InferSelectModel<typeof eventTickets>[] = [];
  const eventScheduleRows: InferSelectModel<typeof eventSchedules>[] = [];
  const eventSocialMediaRows: InferSelectModel<typeof eventSocialMedias>[] = [];
  const eventContactPersonRows: InferSelectModel<typeof eventContactPersons>[] =
    [];

  const dummyPhotos = [
// dummy photos
  ];

  const randomCreator = await db.query.users.findFirst({
    where: eq(users.role, 'EVENT_CREATOR'),
    columns: { id: true },
  });

  // create event rows with its related data rows too
  for (let i = 0; i < number; i++) {
    const fakeEventId = faker.string.uuid();
    eventRows.push({
      creatorId: randomCreator.id,
      // ...event data
    });

    const fakeTimezone = faker.helpers.arrayElement(INDONESIA_TIMEZONES);

    for (let j = 0; j < 3; j++) {
      eventTicketRows.push({
        eventId: fakeEventId,
        // ...event ticket data
      });

      eventScheduleRows.push({
        eventId: fakeEventId,
        // ...event schedule data
      });

      eventSocialMediaRows.push({
        eventId: fakeEventId,
        // ...event social media data
      });

      eventContactPersonRows.push({
        eventId: fakeEventId,
        // ...event contact person data
      });
    }
  }

  return {
    eventRows,
    eventTicketRows,
    eventScheduleRows,
    eventSocialMediaRows,
    eventContactPersonRows,
  };
};

// main seed function
const seedEvents = async () => {
  try {
    // check if seed file has been run
    await db.insert(seeds).values({ name: fileName });

    console.log('Seeding events...');

    const {
      eventRows,
      eventTicketRows,
      eventScheduleRows,
      eventSocialMediaRows,
      eventContactPersonRows,
    } = await generateEvents(50);

    await db.transaction(async (tx) => {
      await tx.insert(events).values(eventRows);
      await tx.insert(eventTickets).values(eventTicketRows);
      await tx.insert(eventSchedules).values(eventScheduleRows);
      await tx.insert(eventSocialMedias).values(eventSocialMediaRows);
      await tx.insert(eventContactPersons).values(eventContactPersonRows);
    });
  } catch (err) {
    console.log('Event seeder has been executed.');
  }
};

seedEvents()
  .catch((err) => {
    console.error(err);
    process.exit(1);
  })
  .finally(async () => {
    console.log('Seeding events done.');
    process.exit(0);
  });

We'll set up our package.json to have a seed command to run the seeding files (including the one we wrote before):

...
"scripts": {
   "seed": "yarn ts-node src/seed/execute.ts"
},
...

This script will then be run during deployment:

FROM node:21-alpine3.18

...

RUN yarn seed

RUN yarn build

CMD [ "yarn", "start" ]

Before building the app, seeding will happen and our database will be loaded beforehand.

Part 2. Migration

Background

Migrations are files that keep track of our database changes. This helps manage our database and keeps consistency of database states in different environments.

How to Migrate?

Let's say I create this new table:

import { pgTable, text, timestamp } from "drizzle-orm/pg-core";

export const seeds = pgTable('seeds', {
    name: text('name').primaryKey(),
    createdAt: timestamp('created_at', {
        mode: 'date',
        withTimezone: true,
    }).defaultNow(),
});

To create a migration file, I would run yarn generate:pg, which will run yarn drizzle-kit generate:pg (this has been set up in our package.json file. This will create a new SQL file:

CREATE TABLE IF NOT EXISTS "seeds" (
	"name" text PRIMARY KEY NOT NULL,
	"created_at" timestamp with time zone DEFAULT now()
);

We don't have to write this manually. That's a really good thing as it makes our database setup faster.

Now, to run this file (and other migration files) to update our database, there is a special file called migrate.ts:

// migrate.ts

import { PostgresJsDatabase, drizzle } from 'drizzle-orm/postgres-js';
import { migrate } from 'drizzle-orm/postgres-js/migrator';
import * as postgres from 'postgres';

const client = postgres(process.env.DATABASE_URL!);
const db: PostgresJsDatabase = drizzle(client);

const main = async () => {
  try {
    console.log('Running migrations...');
    await migrate(db, { migrationsFolder: 'drizzle' });
    console.log('Migration complete!');
    process.exit(0);
  } catch (err) {
    console.log(err);
  }
};

main();

This will run all of our migration files. In other words, this will run the SQL scripts generated by Drizzle.

Will we run this script manually in deployment? Of course not. We'll use the same strategy as seeding by utilizing our Dockerfile (Docker is really cool):

FROM node:21-alpine3.18

...

RUN yarn migrate

RUN yarn seed

RUN yarn build

CMD [ "yarn", "start" ]

This will migrate our database before seeding to ensure that no errors would happen when adding rows to our database while seeding.

Conclusion

Database seeding and migration can (and should) be done through a CI/CD pipeline for development effectiveness. Although it requires some time for setup, in my opinion it's quite an investment and will be useful in the future of the development of the application.