The autocomplete system is composed of two programs: generateAutocompleteGraph.ts
and autocomplete.ts
. These files compose the autocomplete pipeline from building the radix tree to using it on API calls to return suggestions.
Building the Radix Tree: scripts/generateAutocompleteGraph.ts
The process of building the tree happens every time npm run dev
or npm run build
is run. The predev
/prebuild
scripts in package.json
run generateAutocompleteGraph.ts
via ts-node.
Input Data Format
The first step is to read in JSON data of every section searchable. This is currently from a local file (autocomplete-min.json
) but hopefully soon will be from an API call. This data is stored in a nested format where a plural key (A) stores a list of objects. Each object stores a singular key (B) denoting the data and all nested entries matching the singular keys' (B) data. The first list is assumed to be subject_prefixes
. For example:
{ "course_numbers": [ // A { ...data matching 1200 course number "course_number": "1200" // B }, { ...data matching 2305 course number "course_number": "2305" // B }, ...other course numbers ] }
It uses the following minified keys nested in this order:
Data type | Plural Key | Plural Minified Key | Singular Key | Singular Minified Key |
---|---|---|---|---|
Subject prefix | N/A | N/A | subject_prefix | sp |
Course numbers | course_numbers | cns | course_number | cn |
Academic sessions | academic_sessions | ass | academic_session | as |
Sections | sections | s | section_number | sn |
Professors | professors | p | N/A | N/A |
At the end of the nesting the professor type is assumed so there is no need for a singular key and the first and last name are just listed.
Building the Radix Tree
generateAutocompleteGraph.ts
then uses the Graphology library to build, first a directed acyclic graph (DAG), then a radix tree. The DAG is built such that each layer of the tree corresponds to a search character. Each of these nodes has an optional data parameter that is filled if the search is a valid finished section. For example the “S“ node in the second layer would have the data to return the suggestion “CS“ and the “0“ node, the suggestion “CS 1200“. To make DAG into a radix tree nodes in a straight line (with no branches) and with no data in the middle are combined. For example the nodes “1“->”2”->”0”->”0” could turn into one node with “1200“, although in practice there will certainly be branches off this for example 1 will branch to 1200 and 1300. Below I have illustrated the format of the tree in all search formats accepted.
(This is the Microsoft Visio file used to make this diagram)
This data is exported to autocomplete_graph.json
Using the Radix Tree: pages/api/autocomplete.ts
The autocomplete route receives a query to provide suggestions for and uses the tree which it reads in from autocomplete_graph.json
to provide these. In essence it does a breadth first search down the radix tree returning the “closest“ results.
This recursion is accomplished using a priority queue and two functions: bfsRecursion
which recurses in a normal way and bfsRecursionToNextData
which recurses until it finds the next node with data then stops. The following table describes the conditions for recursing through the tree by bfsRecursion
. This table is based on the following three criteria: whether the node has data, whether the search query matches the node, and whether the end of the search query has been reached. In the table I call these data, match, and last respectively. Based on these criteria we do any or multiple of a few things: queue all children nodes for bfsRecursion
, queue all children nodes for bfsRecursionToNextData
, and/or return the data of the node to the suggestion results. In the table I call these bfsRecursion
, queue to next, and return data.
No data | Data | |||
---|---|---|---|---|
Full match or match to end of query | Last |
| Last |
return data |
Not last |
| Not last |
| |
Partial match | Last |
| Last |
return data |
Not last |
| Not last |
return data |
The priorities of elements in the priority queue are equal to the depth, in number of characters, that has been reached in the tree. So the search “CS 1200.100“ that has been evaluated up to “CS 1200“ would have priority 7 so other results will be evaluated first. The bfsRecursionToNextData
function, after it has already found its first element, will continue to queue children with priority+100 so that more results can be returned when few are available.
Once we have reached the passed in limit number of suggestions we remove duplicates and return the results.