A extractor creates meta information from the original file or form other extractor files.
As example: The Exif extractor reads the image and provides exif data as meta data to the storage. The geo reverse plugin reads the exif meta data, requests the address from a remote service and stores these address as further meta information.
Further example: The image resizer reads the image and stores the preview file in the storage. The AI extractor reads a small preview image, sends it to the api service and stores similarity vectors as new meta data.
The extractor has following phases
The meta phase reads basic meta data from files for each file.
The raw phase receives a file grouped by sidecars and can extract images from raw files. The assumption is that a raw file extraction is expensive and should only be executed if no image sidecar is available.
The file phase is called again for each file (sidecar files are flatten again).
Therefore the extracor object has a name and
phase property and a create function. The
async create() function returns:
(entry) => Promise<void> ortest?: (entry) => boolean, a required task: (entry) => Promise<void> and optional end: () => Promise<void> function orTransform object// plugin definition as above
async function factory(manager) {
await manager.register('extractor', acmeExtractor(manager))
}
function extractor(manager) (
const pluginConfig = manager.getConfig().plugin?.acme || {}
const suffix = 'acme.json'
const log = manager.createLogger('plugin.acme.extractor')
return {
name: 'acmeExtractor',
phase: 'file',
async create(storage) {
// plugins can provide properties or functions on the context
const created = new Date().toISOString()
const value = 'Acme'
// Read property from plugin's configuration plugin.acme.property for customization
const property = pluginConfig.property || 'defaultValue'
log.debug(`Creating Acme extractor task`)
return {
test(entry) {
// Execute task if the storage file is not present
return !storage.hasFile(entry, suffix)
},
async task(entry) {
log.debug(`Processing ${entry}`)
const data = { created, value, property }
// Write plugin data to storage. Data can be a buffer, string or object
return storage.writeFile(entry, suffix, data)
}
}
}
}
})The storage object has functions to read data from and write data to the object storage.
type TStorage = {
// Evaluates if the entry has given storage file
hasFile(entry, suffix): boolean
// Reads a file from the storage
readFile(entry, suffix): Promise<Buffer | any>
// Write a extracted data to the storage.
//
// If the suffix ends on `.json` or `.json.gz` the data is automatically serialized and compressed.
// The storage file is added to the entry `.files` array and the json data is added to the `.meta` object
writeFile(entry, suffix, data): Promise<void>
// Copy a local file to the storage
copyFile(entry, suffix, file): Promise<void>
// Creates a symbolic link from a local file
symlink(entry, suffix, file): Promise<any>
// Removes a file from the storage directory
removeFile(entry, suffix): Promise<any>
// Creates a local file handle new or existing storage files.
//
// The file handle should be committed or released after usage
createLocalFile(entry, suffix): Promise<TLocalStorageFile>
// Create local directory to create files
createLocalDir(): Promise<TLocalStorageDir>
}