/
Autocomplete Documentation

Autocomplete Documentation

The autocomplete system is composed of two programs: generateAutocompleteGraph.ts and autocomplete.ts. These files compose the autocomplete pipeline from building the radix tree to using it on API calls to return suggestions.

Building the Radix Tree: scripts/generateAutocompleteGraph.ts

The process of building the tree happens every time npm run dev or npm run build is run. The predev/prebuild scripts in package.json run generateAutocompleteGraph.ts via ts-node.

Input Data Format

The first step is to read in JSON data of every section searchable. This is currently from a local file (autocomplete-min.json) but hopefully soon will be from an API call. This data is stored in a nested format where a plural key (A) stores a list of objects. Each object stores a singular key (B) denoting the data and all nested entries matching the singular keys' (B) data. The first list is assumed to be subject_prefixes. For example:

{ "course_numbers": [ // A { ...data matching 1200 course number "course_number": "1200" // B }, { ...data matching 2305 course number "course_number": "2305" // B }, ...other course numbers ] }

It uses the following minified keys nested in this order:

Data type

Plural Key

Plural Minified Key

Singular Key

Singular Minified Key

Data type

Plural Key

Plural Minified Key

Singular Key

Singular Minified Key

Subject prefix

N/A

N/A

subject_prefix

sp

Course numbers

course_numbers

cns

course_number

cn

Academic sessions

academic_sessions

ass

academic_session

as

Sections

sections

s

section_number

sn

Professors

professors

p

N/A

N/A

At the end of the nesting the professor type is assumed so there is no need for a singular key and the first and last name are just listed.

Building the Radix Tree

generateAutocompleteGraph.ts then uses the Graphology library to build, first a directed acyclic graph (DAG), then a radix tree. The DAG is built such that each layer of the tree corresponds to a search character. Each of these nodes has an optional data parameter that is filled if the search is a valid finished section. For example the “S“ node in the second layer would have the data to return the suggestion “CS“ and the “0“ node, the suggestion “CS 1200“. To make DAG into a radix tree nodes in a straight line (with no branches) and with no data in the middle are combined. For example the nodes “1“->”2”->”0”->”0” could turn into one node with “1200“, although in practice there will certainly be branches off this for example 1 will branch to 1200 and 1300. Below I have illustrated the format of the tree in all search formats accepted.

(This is the Microsoft Visio file used to make this diagram)

This data is exported to autocomplete_graph.json

Using the Radix Tree: pages/api/autocomplete.ts

The autocomplete route receives a query to provide suggestions for and uses the tree which it reads in from autocomplete_graph.json to provide these. In essence it does a breadth first search down the radix tree returning the “closest“ results.

This recursion is accomplished using a priority queue and two functions: bfsRecursion which recurses in a normal way and bfsRecursionToNextData which recurses until it finds the next node with data then stops. The following table describes the conditions for recursing through the tree by bfsRecursion. This table is based on the following three criteria: whether the node has data, whether the search query matches the node, and whether the end of the search query has been reached (ex. only typing “CS 12“ and reaching the “2“ node). In the table I call these data, match, and last respectively. Based on these criteria we do any or multiple of a few things: queue all children nodes for bfsRecursion, queue all children nodes for bfsRecursionToNextData, and/or return the data of the node to the suggestion results. In the table I call these bfsRecursion, queue to next, and return data.

 

No data

Data

 

No data

Data

Full match or match to end of query

Last

bfsRecursionToNextData

Last

bfsRecursionToNextData

return data

Not last

bfsRecursion

Not last

bfsRecursion

Partial match

Last

bfsRecursionToNextData

Last

bfsRecursionToNextData

return data

Not last

bfsRecursionToNextData

Not last

bfsRecursionToNextData

return data

The priorities of elements in the priority queue are equal to the depth, in number of characters, that has been reached in the tree. So the search “CS 1200.100“ that has been evaluated up to “CS 1200“ would have priority 7 so other results will be evaluated first. The bfsRecursionToNextData function, after it has already found its first element, will continue to queue children with priority+100 so that more results can be returned when few are available.

Once we have reached the passed in limit number of suggestions we remove duplicates and return the results.