Universal Code Search Combinations

More potential searches than there are stars in the universe.

Source code search is not like searching for text on the web—it is more complex and has many more nuances. When we set out to create a Universal Code Search engine, we knew there were many permutations that we had to account for—so many. So we began speculating over how many search permutations we would have to account for.

First, let’s cover what Universal Code Search is. Search today is not just “Grep and go.” The modern era of big code requires code search to cover many dimensions: repositories, file formats, programming languages, etc. Universal Code Search ties together information from many tools, from repositories on your code host to dependency relationships among your projects and application runtime information, to give developers a single place to quickly explore and better understand all code faster.

A Universal Code Search must satisfy the following requirements:

Search through repositories on any major code host.
Search across all commits, making both the current (up-to-date) and past codebases discoverable.
Support all programming languages.
Support code intelligence for all major languages.

To quantify how expansive Universal Code Search is, the best metric is the number of search combinations.

Developers can search across more than 18 dimensions of code to navigate, explore and understand code.

Repositories

Sourcegraph can search across any repository, spanning all Git providers in addition to other source code version control systems. GitHub hosts over 100 million repositories, so using a 17.31% market share, we can estimate there are 577,700,751 searchable repositories.

Commits

Sourcegraph can search any repository, and any commit within the repository. For instance, if the average repository has five developers contributing five commits for 300 days per year over five years, Sourcegraph would be able to search all 37,500 commits using repo:<REPONAME>@<COMMIT>.

Branches

As with commits, Sourcegraph can search all of a repository’s branches. Using the same methodology, we can assume that each of the five developers works on five feature branches per month over five years. This would lead to 1,500 branches on the repository that could be searched using repo:<REPONAME>@<BRANCHNAME>.

File Type

Sourcegraph supports all programming languages and developer files, in addition to other text files. Between text and developer files, we conservatively estimate this number to be 1,155 total file types supported.

Pattern Type

Sourcegraph allows your search to be interpreted three ways: literally, as a regular expression (regex) or as a structural search pattern. Structural search lets you match richer syntax patterns specifically in code and structured data formats like JSON.

Search Type

Four search types are permitted. Whether you need to search over all your code at a given point in time, changes to code or commit messages (diffs or commits), Sourcegraph gives you access to all depths, past and present, of your codebase. *file* *repo*

Repo Forks and Archives

Include repository forks in your search, or don’t. Include repository archives, or don’t. These are the four options for searching forks and archives among your codebase in Sourcegraph search. *yes, no, only*

Case

Sourcegraph search supports both case insensitive and sensitive search. And when performing a diff or commit search type, there are five more filters you can use.

Developer

Search for the author of the code, or for the committer. These are the two options when performing a diff or commit search.

Time

Sourcegraph supports searching both before and after a time frame, such as before: “last Thursday,” after: “6 weeks ago” or after: “November 1, 2019.”

Message

You can also search diff and commit searches for which the commit message contains a particular string.

IDE

Four IDE integrations are supported: VS Code, Atom, IntelliJ and Sublime Text. This helps improve developer productivity by reducing context-switching. The limitations on Sourcegraph’s IDE integrations are limited in performing structural searches, use filters and search past commits. Therefore, most of the dimensions described above don’t apply to IDEs.

What does this add up to? 2,702,343,531,371,460,001,448 potential code searches across all repos, all programming languages, all code changes, all file formats and all code hosts. Universal Code Search makes it easier and faster to explore, navigate and better understand all code, everywhere. Don’t believe it? Go ahead and see for yourself. Change the assumptions to match your organization.

— Quinn Slack

Universal Code Search Combinations

Repositories

Commits

Branches

File Type

Pattern Type

Search Type

Repo Forks and Archives

Case

Developer

Time

Message

IDE

Seraphic Security Unveils BrowserTotal™ – Free AI-Powered Browser Security Assessment for Enterprises

INE Security Alert: $16.6 Billion in Cyber Losses Underscore Critical Need for Advanced Security Training

INE Security and RedTeam Hacker Academy Announce Partnership to Advance Cybersecurity Skills in the Middle East

INE Security Partners with Abadnet Institute for Cybersecurity Training Programs in Saudi Arabia

INE Security Alert: Continuous CVE Practice Closes Critical Gap Between Vulnerability Alerts and Effective Defense

Sign up for our newsletter!Stay informed on the latest DevOps news

Repositories

Commits

Branches

File Type

Pattern Type

Search Type

Repo Forks and Archives

Case

Developer

Time

Message

IDE

Sign up for our newsletter!
Stay informed on the latest DevOps news