Developer indexes 172,000 AI agent skills by chunking GitHub's 1,000-result search limit

GitHub's Search API caps results at 1,000 per query, a documented constraint with no native workaround. A new open-source project worked around this by running multiple specialized searches across paths, file sizes, topics, and fork networks to index AI agent SKILL.md files scattered across repositories.

The Biggish Editorial · Tuesday, February 3, 2026

The limit is real

GitHub's REST and GraphQL Search APIs return a maximum of 1,000 results per query. This is documented, confirmed by developers, and has no official bypass. For enterprise teams building AI tooling discovery systems, this constraint matters.

AI agents like Claude, OpenAI Codex, and GitHub Copilot use SKILL.md files to define capabilities: PDF handling, Excel formulas, brand guidelines. These files scatter across GitHub in ~/.claude/skills/, .github/skills/, random skills/ folders, and personal dotfiles repos.

A single search for filename:SKILL.md hits the 1,000-result ceiling immediately.

The workaround: query segmentation

The team behind SkillHub, an open-source skill marketplace, ran multiple targeted searches instead of fighting the limit:

Path-based chunking: Separate queries for path:skills, path:.claude, path:.github, path:.codex. Four queries, up to 4,000 potential results.

File size segmentation: size:<1000, size:1000..5000, size:>5000 splits the same files across different result sets.

Topic filtering: Repos tagged claude-skills, agent-skills, ai-skills get deep-scanned individually.

Curated list crawling: Parse awesome-lists for linked repositories, then index those.

Fork traversal: Check popular repo forks for modified skills that never merged upstream.

The system runs daily incremental crawls and weekly full discovery passes. They handle GitHub's 1,000 requests per hour per token limit by rotating credentials.

What they built

The stack: Next.js 15 frontend, PostgreSQL for metadata, Meilisearch for typo-tolerant search, Redis/BullMQ for background jobs. Every skill gets scanned for shell commands, prompt injection patterns, and data exfiltration attempts.

Current index: 172,000+ skills, 4,000+ contributors, 30 categories.

CLI available via npm install -g skillhub. Web interface at skills.palebluedot.live. MIT licensed.

Worth noting

The 172,000 figure is unverified in public GitHub stats. The approach works but developers in forums call similar techniques "dirty hacks" prone to API variability and incomplete coverage. PyGithub's inefficient paging (29 API calls for 1,000 results) pushes some teams toward raw requests library implementations.

For CTOs evaluating AI agent infrastructure: GitHub's search constraints are real, workarounds exist but require multi-query orchestration, and no method guarantees complete coverage. If your team needs large-scale GitHub discovery, budget for query strategy, not just rate limit handling.

The limit is real

The workaround: query segmentation

What they built

Worth noting

Related Articles

Browser-based health AI runs LLMs locally, but enterprise skepticism remains

Press releases now serve two audiences: journalists and AI systems

Kestra adds AI copilot with workflow context to fix generic LLM hallucination problems