Tutorial#

Storck is built using Django and python3. The core functionalities work as Django rest API servers. Users can use the web interface to login with CERN SSO and, through their browser, interact with storck. Also, users can create programs and, using the set of tokens, communicate directly with the server via REST API. The API mainly serves as an information centre about the files stored but also can provide file transfer with the webserver. The suggested way of handling file transfer is via the filesystem exchange.

How files are stored#

When storck receives a file, first, it is checked for duplicates in the system (see later subsection). The file path is separated from the file content. The md5sum is calculated for the file’s content. Filepath is saved to the database along with the file’s detailed information and the file’s contents hash (md5 sum). The file is then saved to the filesystem and named with the appropriate hash.

How versioning works#

For versioning, storck uses a list structure. When storck receives a file with the same file path as the existing file in storck, it will save it, and along with it, the id pointing to the database record that is the previous version of the file. When the file under this file path is requested, the storck will respond with the newest file, but also, when outputting detailed information, it will show the id of the previous version.

What is deduplication, and how does it work#

When a new file arrives in storck, its md5sum will be calculated. After that, storck will check all the files in its database for the file with the same md5sum. If found, storck will not process the content of the files further but will create a record in the database, which contents will point to the file already existing in the filesystem.

How the filesystem exchange of files works#

The filesystem exchange is simple. We assume that the data stored on storck are open to reading for anyone. Therefore the filesystem in the storck is open for reading access. The rest API can be used to receive information about which files should be downloaded. Then the copy can be made directly from the storck’s drive via a simple cp command like cp storck/filesystem/storage/path/file target/destination/of/file.

How exactly does metadata works#

The metadata for storck is using Postgres JSON field, and Django queries. In short, the metadata is stored as a JSON field along with the information about the file in storck’s database. The queries to storck can be made with a JSON field that will be unpacked as arguments to the django’s filter method.