Auto-populated tables are used to define, execute, and coordinate computations in a DataJoint pipeline.
Tables in the initial portions of the pipeline are populated from outside the pipeline. In subsequent steps, computations are performed automatically by the DataJoint pipeline in auto-populated tables.
Computed tables belong to one of the two auto-populated data tiers:
DataJoint does not enforce the distinction between imported and computed tables: the difference is purely semantic, a convention for developers to follow.
If populating a table requires access to external files such as raw storage that is not part of the database, the table is designated as imported.
Otherwise it is computed.
For auto-populated tables, data should never be entered using insert directly.
Instead these tables must define the callback method
insert method then can only be called on
self inside this callback method.
Imagine that there is a table
test.Image that contains 2D grayscale images in its
Let us define the computed table,
test.FilteredImage that filters the image in some way and saves the result in its
The class will be defined as follows.
@schema class FilteredImage(dj.Computed): definition = """ # Filtered image -> Image --- filtered_image : longblob """ def make(self, key): img = (test.Image & key).fetch1('image') key['filtered_image'] = myfilter(img) self.insert(key)
make method receives one argument: the dict
key containing the primary key value of an element of key source to be worked on.
make method received one argument: the
key of type
struct in MATLAB and
dict in Python.
The key represents the partially filled entity, usually already containing the primary key attributes of the key source.
make callback does three things:
Computes and adds any missing attributes to the fields already in
Inserts the entire entity into
make may populate multiple entities in one call when
key does not specify the entire primary key of the populated table.
populate method of
dj.Computed automatically calls
make for every key for which the auto-populated table is missing data.
FilteredImage table can be populated as
The progress of long-running calls to
populate() in datajoint-python can be visualized by adding the
display_progress=True argument to the populate call.
Note that it is not necessary to specify which data needs to be computed.
DataJoint will call
make, one-by-one, for every key in
Image for which
FilteredImage has not yet been computed.
Chains of auto-populated tables form computational pipelines in DataJoint.
populate method accepts a number of optional arguments that provide more features and allow greater control over the method’s behavior.
restrictions- A list of restrictions, restricting as
(tab.key_source & AndList(restrictions)) - tab.proj(). Here
targetis the table to be populated, usually
True, encountering an error will cancel the current
makecall, log the error, and continue to the next
makecall. Error messages will be logged in the job reservation table (if
True) and returned as a list. See also
reserve_jobs. Defaults to
True, error objects are returned instead of error messages. This applies only when
True. Defaults to
True, reserves job to indicate to other distributed processes. The job reservation table may be access as
schema.jobs. Errors are logged in the jobs table. Defaults to
order- The order of execution, either
"random". Defaults to
True, displays a progress bar. Defaults to
limit- If not
None, checks at most this number of keys. Defaults to
max_calls- If not
None, populates at most this many keys. Defaults to
None, which means no limit.
table.progress reports how many
key_source entries have been populated and how many remain.
Two optional parameters allow more advanced use of the method.
A parameter of restriction conditions can be provided, specifying which entities to consider.
A Boolean parameter
display (default is
True) allows disabling the output, such that the numbers of remaining and total entities are returned but not printed.