More potential searches than there are stars in the universe.
Source code search is not like searching for text on the web—it is more complex and has many more nuances. When we set out to create a Universal Code Search engine, we knew there were many permutations that we had to account for—so many. So we began speculating over how many search permutations we would have to account for.
First, let’s cover what Universal Code Search is. Search today is not just “Grep and go.” The modern era of big code requires code search to cover many dimensions: repositories, file formats, programming languages, etc. Universal Code Search ties together information from many tools, from repositories on your code host to dependency relationships among your projects and application runtime information, to give developers a single place to quickly explore and better understand all code faster.
A Universal Code Search must satisfy the following requirements:
- Search through repositories on any major code host.
- Search across all commits, making both the current (up-to-date) and past codebases discoverable.
- Support all programming languages.
- Support code intelligence for all major languages.
To quantify how expansive Universal Code Search is, the best metric is the number of search combinations.
Developers can search across more than 18 dimensions of code to navigate, explore and understand code.
Repositories
Sourcegraph can search across any repository, spanning all Git providers in addition to other source code version control systems. GitHub hosts over 100 million repositories, so using a 17.31% market share, we can estimate there are 577,700,751 searchable repositories.
Commits
Sourcegraph can search any repository, and any commit within the repository. For instance, if the average repository has five developers contributing five commits for 300 days per year over five years, Sourcegraph would be able to search all 37,500 commits using repo:<REPONAME>@<COMMIT>.
Branches
As with commits, Sourcegraph can search all of a repository’s branches. Using the same methodology, we can assume that each of the five developers works on five feature branches per month over five years. This would lead to 1,500 branches on the repository that could be searched using repo:<REPONAME>@<BRANCHNAME>.
File Type
Sourcegraph supports all programming languages and developer files, in addition to other text files. Between text and developer files, we conservatively estimate this number to be 1,155 total file types supported.
Pattern Type
Sourcegraph allows your search to be interpreted three ways: literally, as a regular expression (regex) or as a structural search pattern. Structural search lets you match richer syntax patterns specifically in code and structured data formats like JSON.
Search Type
Four search types are permitted. Whether you need to search over all your code at a given point in time, changes to code or commit messages (diffs or commits), Sourcegraph gives you access to all depths, past and present, of your codebase. *file* *repo*
Repo Forks and Archives
Include repository forks in your search, or don’t. Include repository archives, or don’t. These are the four options for searching forks and archives among your codebase in Sourcegraph search. *yes, no, only*
Case
Sourcegraph search supports both case insensitive and sensitive search. And when performing a diff or commit search type, there are five more filters you can use.
Developer
Search for the author of the code, or for the committer. These are the two options when performing a diff or commit search.
Time
Sourcegraph supports searching both before and after a time frame, such as before: “last Thursday,” after: “6 weeks ago” or after: “November 1, 2019.”
Message
You can also search diff and commit searches for which the commit message contains a particular string.
IDE
Four IDE integrations are supported: VS Code, Atom, IntelliJ and Sublime Text. This helps improve developer productivity by reducing context-switching. The limitations on Sourcegraph’s IDE integrations are limited in performing structural searches, use filters and search past commits. Therefore, most of the dimensions described above don’t apply to IDEs.
What does this add up to? 2,702,343,531,371,460,001,448 potential code searches across all repos, all programming languages, all code changes, all file formats and all code hosts. Universal Code Search makes it easier and faster to explore, navigate and better understand all code, everywhere. Don’t believe it? Go ahead and see for yourself. Change the assumptions to match your organization.