Skip to main content

Import from GitHub

Last updated on

Multi-Repo catalog population script (GitHub)

In large engineering organizations, service sprawl across hundreds or even thousands of GitHub repositories is common. Manually onboarding each service into the Harness Software Catalog, either by creating catalog-info YAMLs individually or configuring one catalog location per repository, quickly becomes unmanageable. Moreover, using the default discovery plugins to register each repository as a location can lead to fragility; a single failure during sync can prevent the entire catalog from updating correctly.

This script provides a scalable and GitOps-friendly solution by automating the end-to-end onboarding process using the Harness IDP 2.0 Entities API and Git Experience (GitX). It programmatically fetches all repositories from your GitHub organization, dynamically generates a valid idp.yaml for each service, and pushes that file into a centralized GitHub repository. Each YAML is committed to a unique path using the Git connector you have configured in Harness. This ensures that service metadata is not only standardized but also version-controlled in Git, promoting visibility and auditability.

Script source

Source Code

curl -o idp-catalog-population-multirepo-github.py https://raw.githubusercontent.com/harness-community/idp-samples/main/IDP-2.0-Samples/catalog-scripts/idp-catalog-population-multirepo-github.py

Before you begin

GITHUB_TOKEN = '<github-token>'
HARNESS_API_KEY = '<harness-api-key>'
HARNESS_ACCOUNT_ID = '<harness-account-id>'
CONNECTOR_REF = '<harness-git-connector-ref>'
ORG_IDENTIFIER = '<harness-org-id>'
PROJECT_IDENTIFIER = '<harness-project-id>'
CENTRAL_REPO = '<name-of-central-repo-to-store-yamls>'
GITHUB_ORG = '<github-org-name>'

GITHUB_TOKEN must have repo and read:org permissions. HARNESS_API_KEY must be a User/API Key with write access to IDP entities.

Execution

After creating your .env file, run the script:

python3 idp-catalog-population-multirepo-github.py

This will:

  1. Fetch all repositories from your GitHub org.

  2. For each repo:

    • Sanitize the identifier.
    • Generate a valid idp.yaml.
    • Push it to the specified folder in CENTRAL_REPO.
    • Register the entity in Harness using the Entities API.

Output structure

The catalog YAML files will be stored in the following pattern inside your central GitHub repository:

central-repo/
├── service-one/
│ └── idp.yaml
├── service-two/
│ └── idp.yaml
└── ...

Each YAML will look like:

apiVersion: harness.io/v1
kind: component
orgIdentifier: <your-org>
projectIdentifier: <your-project>
type: Service
identifier: sanitized_unique_id
name: repo-name
owner: group:account/IDP_Test
spec:
lifecycle: production
metadata:
description: "repo description from GitHub"
annotations:
backstage.io/source-location: url:https://github.com/<your-org>/<repo-name>
backstage.io/techdocs-ref: dir:.
tags:
- auto-onboarded

Logs & troubleshooting

  • Script output includes a status message for each repo (success/failure).
  • Failures are logged with full error messages from the Harness API.
  • For personal GitHub accounts, change the GitHub API URL from:
https://api.github.com/orgs/{GITHUB_ORG}/repos

to:

https://api.github.com/users/{GITHUB_ORG}/repos
info

If you are interested to try out different request to work with entities, you can use the new Entities APIs.