Logo ProcessCore

Creating A Dataset

This walkthrough builds a small process graph from F# objects. The goal is to show the model shape rather than every field in the specification.

Dataset

Start with a dataset. Administrative metadata is optional, but an identifier is the stable handle for the dataset.

let dataset = Dataset("demo-dataset")
dataset.Name <- Some "Minimal ProcessCore example"
dataset.Description <- Some "One extraction process with nested quality control."

LabProtocol

A protocol describes the method. Formal parameters define expected knobs, for which values should be provided when the protocol is executed.

let protocol = LabProtocol()
let temperature = FormalParameter("temperature")
protocol.Name <- Some "Extraction"
protocol.IntendedUse <- Some (DefinedTerm("sample extraction"))
protocol.AddParameter(temperature)

Components are non-transformed entities in a protocol, such as machines or reagents.

let centrifuge = PropertyValue(name = "centrifuge", value = "Eppendorf 5420")
let buffer = PropertyValue(name = "buffer", value = "PBS")

protocol.AddLabEquipment(centrifuge)
protocol.AddLabEquipment(buffer)

LabProcess

LabProcesses are the core of the process graph. They are concrete executions of a protocol, with specific parameter values, and input and output entities.

First, we define input and output, i.e. Material or Data nodes.

let leaf = Material("Leaf tissue")

let extractData = Data("raw/extract.csv")
extractData.EncodingFormat <- Some "text/csv"

A LabProcess connects those inputs to outputs. We also attach parameter values to the process, which should correspond to the protocol's formal parameters.

let extraction = LabProcess("Extraction")
let degrees25 = PropertyValue(name = "temperature", value = "25", unit = "degree Celsius", instanceOf = temperature)
extraction.ExecutesProtocol <- Some protocol
extraction.AddInputMaterial(leaf)
extraction.AddOutputData(extractData)
extraction.AddParameterValue(degrees25)

dataset.AddProcess(extraction)

Nested Datasets

Datasets can contain child datasets. When a child dataset is added, its process nodes are re-canonicalized against the root dataset.

let child = Dataset("qc-dataset")
child.Name <- Some "Quality control"

let qcReport = Data("qc/extract-report.tsv")

let qc = LabProcess("Quality Control")
qc.AddInputData(extractData)
qc.AddOutputData(qcReport)
let threshold = FormalParameter("threshold")
let threshold95 = PropertyValue(name = "threshold", value = "0.95", instanceOf = threshold)
qc.AddParameterValue(threshold95)

child.AddProcess(qc)
dataset.AddPart(child)

The parent process output and the child process input are the same logical Data node: same path, no selector. After AddPart, they are also the same object instance in the root dataset.

let qcInputAfterAttach =
    match qc.Inputs.[0] with
    | DataNode d -> d
    | MaterialNode _ -> failwith "Expected data input"

let sharedDataIdentity =
    obj.ReferenceEquals(extractData, qcInputAfterAttach)

sharedDataIdentity
true

The graph is now queryable from either the dataset or any node.

let finalNodes =
    dataset.FinalNodes()
    |> Seq.map (fun n -> n.Key())
    |> Seq.toList

finalNodes
["D:qc/extract-report.tsv"]

What To Use When

Task

API

Create a container

Dataset(identifier)

Add a process

dataset.AddProcess(process)

Add nested datasets

dataset.AddPart(child)

Connect materials or files

process.AddInputMaterial, process.AddOutputData

Attach process parameters

process.AddParameterValue

Attach characteristics/factors

node.AddAdditionalProperty

Attach protocol components

protocol.AddLabEquipment

namespace ProcessCore
val mermaidBlock: text: string -> string
val text: string
Multiple items
val string: value: 'T -> string

--------------------
type string = System.String
System.String.Trim() : string
System.String.Trim( trimChars: char array) : string
System.String.Trim(trimChar: char) : string
namespace System
namespace System.Net
type WebUtility = static member HtmlDecode: value: string -> string + 1 overload static member HtmlEncode: value: string -> string + 1 overload static member UrlDecode: encodedValue: string -> string static member UrlDecodeToBytes: encodedValue: byte array * offset: int * count: int -> byte array static member UrlEncode: value: string -> string static member UrlEncodeToBytes: value: byte array * offset: int * count: int -> byte array
<summary>Provides methods for encoding and decoding URLs when processing Web requests.</summary>
System.Net.WebUtility.HtmlEncode(value: string) : string
System.Net.WebUtility.HtmlEncode(value: string, output: System.IO.TextWriter) : unit
val sprintf: format: Printf.StringFormat<'T> -> 'T
val dataset: Dataset
Multiple items
type Dataset = inherit DynamicObj new: identifier: string * ?name: string * ?description: string * ?additionalType: string * ?processes: LabProcess seq * ?hasPart: Dataset seq * ?additionalProperty: PropertyValue seq -> Dataset member AddAdditionalProperty: pv: PropertyValue -> unit member AddPart: child: Dataset -> unit member AddProcess: proc: LabProcess -> unit member AllConnectedNodes: node: IONode -> ResizeArray<IONode> member AllData: unit -> ResizeArray<Data> member AllMaterials: unit -> ResizeArray<Material> member AllNodes: unit -> ResizeArray<IONode> member AllProcesses: unit -> ResizeArray<LabProcess> ...
<summary> Container and context for data and processes. schema.org/Dataset </summary>

--------------------
new: identifier: string * ?name: string * ?description: string * ?additionalType: string * ?processes: LabProcess seq * ?hasPart: Dataset seq * ?additionalProperty: PropertyValue seq -> Dataset
property Dataset.Name: string option with get, set
union case Option.Some: Value: 'T -> Option<'T>
property Dataset.Description: string option with get, set
val protocol: LabProtocol
Multiple items
type LabProtocol = inherit DynamicObj new: ?name: string * ?description: string * ?version: string * ?url: string * ?intendedUse: DefinedTerm * ?additionalType: string * ?parameters: FormalParameter seq * ?labEquipment: PropertyValue seq * ?additionalProperty: PropertyValue seq -> LabProtocol member AddAdditionalProperty: pv: PropertyValue -> unit member AddLabEquipment: pv: PropertyValue -> unit member AddParameter: fp: FormalParameter -> unit override Equals: obj: obj -> bool override GetHashCode: unit -> int member RemoveAdditionalProperty: pv: PropertyValue -> unit member RemoveLabEquipment: pv: PropertyValue -> unit member RemoveParameter: fp: FormalParameter -> unit ...
<summary> Description of a planned procedure. bioschemas.org/LabProtocol </summary>

--------------------
new: ?name: string * ?description: string * ?version: string * ?url: string * ?intendedUse: DefinedTerm * ?additionalType: string * ?parameters: FormalParameter seq * ?labEquipment: PropertyValue seq * ?additionalProperty: PropertyValue seq -> LabProtocol
val temperature: FormalParameter
Multiple items
type FormalParameter = inherit DynamicObj new: name: string * ?nameTAN: string * ?defaultValue: DefinedTerm -> FormalParameter override Equals: obj: obj -> bool override GetHashCode: unit -> int member DefaultValue: DefinedTerm option with get, set member Name: string with get, set member NameTAN: string option with get, set
<summary> Describes the shape and type of a protocol parameter slot. bioschemas.org/FormalParameter </summary>

--------------------
new: name: string * ?nameTAN: string * ?defaultValue: DefinedTerm -> FormalParameter
property LabProtocol.Name: string option with get, set
property LabProtocol.IntendedUse: DefinedTerm option with get, set
Multiple items
type DefinedTerm = inherit DynamicObj new: name: string * ?tan: string * ?inDefinedTermSet: string -> DefinedTerm override Equals: obj: obj -> bool override GetHashCode: unit -> int member TermAccessionShort: unit -> string member InDefinedTermSet: string option with get, set member Name: string with get, set member TAN: string option with get, set
<summary> Ontology annotation referencing a term in a controlled vocabulary or ontology. schema.org/DefinedTerm </summary>

--------------------
new: name: string * ?tan: string * ?inDefinedTermSet: string -> DefinedTerm
member LabProtocol.AddParameter: fp: FormalParameter -> unit
val centrifuge: PropertyValue
Multiple items
type PropertyValue = inherit DynamicObj new: name: string * ?value: string * ?unit: string * ?nameTAN: string * ?valueTAN: string * ?unitTAN: string * ?additionalType: string * ?instanceOf: FormalParameter -> PropertyValue override Equals: obj: obj -> bool override GetHashCode: unit -> int member AdditionalType: string option with get, set member InstanceOf: FormalParameter option with get, set member Name: string with get, set member NameTAN: string option with get, set member NameText: string member Unit: string option with get, set ...
<summary> Extensible key-value-unit triple. Primary extension mechanism of ProcessCore. schema.org/PropertyValue </summary>

--------------------
new: name: string * ?value: string * ?unit: string * ?nameTAN: string * ?valueTAN: string * ?unitTAN: string * ?additionalType: string * ?instanceOf: FormalParameter -> PropertyValue
val buffer: PropertyValue
member LabProtocol.AddLabEquipment: pv: PropertyValue -> unit
val leaf: Material
Multiple items
type Material = inherit DynamicObj new: name: string * ?additionalType: string * ?additionalProperty: PropertyValue seq -> Material member AddAdditionalProperty: pv: PropertyValue -> unit member AllConnectedNodes: ?scope: ResizeArray<LabProcess> -> ResizeArray<IONode> member AllConnectedProcesses: ?scope: ResizeArray<LabProcess> -> ResizeArray<LabProcess> member AllPropertyValues: ?scope: ResizeArray<LabProcess> -> ResizeArray<PropertyValue> member ConnectedData: ?scope: ResizeArray<LabProcess> -> ResizeArray<Data> member ConnectedMaterials: ?scope: ResizeArray<LabProcess> -> ResizeArray<Material> member DownstreamData: ?scope: ResizeArray<LabProcess> -> ResizeArray<Data> member DownstreamMaterials: ?scope: ResizeArray<LabProcess> -> ResizeArray<Material> ...
<summary> Input or output biological, chemical, or digital material in the process graph. bioschemas.org/Sample </summary>

--------------------
new: name: string * ?additionalType: string * ?additionalProperty: PropertyValue seq -> Material
val extractData: Data
Multiple items
namespace Microsoft.FSharp.Data

--------------------
type Data = inherit DynamicObj new: path: string * ?selector: string * ?selectorFormat: string * ?encodingFormat: string * ?additionalType: string * ?additionalProperty: PropertyValue seq -> Data member AddAdditionalProperty: pv: PropertyValue -> unit member AllConnectedNodes: ?scope: ResizeArray<LabProcess> -> ResizeArray<IONode> member AllConnectedProcesses: ?scope: ResizeArray<LabProcess> -> ResizeArray<LabProcess> member AllPropertyValues: ?scope: ResizeArray<LabProcess> -> ResizeArray<PropertyValue> member ConnectedData: ?scope: ResizeArray<LabProcess> -> ResizeArray<Data> member ConnectedMaterials: ?scope: ResizeArray<LabProcess> -> ResizeArray<Material> member DownstreamData: ?scope: ResizeArray<LabProcess> -> ResizeArray<Data> member DownstreamMaterials: ?scope: ResizeArray<LabProcess> -> ResizeArray<Material> ...
<summary> Data file produced or consumed by processes. schema.org/MediaObject or File </summary>

--------------------
new: path: string * ?selector: string * ?selectorFormat: string * ?encodingFormat: string * ?additionalType: string * ?additionalProperty: PropertyValue seq -> Data
property Data.EncodingFormat: string option with get, set
<summary> MIME type </summary>
val extraction: LabProcess
Multiple items
type LabProcess = inherit DynamicObj new: name: string * ?executesProtocol: LabProtocol * ?additionalType: string * ?inputs: IONode seq * ?outputs: IONode seq * ?parameterValue: PropertyValue seq -> LabProcess member AddInput: node: IONode -> unit member AddInputData: d: Data -> unit member AddInputMaterial: m: Material -> unit member AddOutput: node: IONode -> unit member AddOutputData: d: Data -> unit member AddOutputMaterial: m: Material -> unit member AddParameterValue: pv: PropertyValue -> unit member CanonicalizeAllNodes: ds: Dataset -> unit ...
<summary> Core transformation node. Connects inputs to outputs via a protocol. bioschemas.org/LabProcess </summary>

--------------------
new: name: string * ?executesProtocol: LabProtocol * ?additionalType: string * ?inputs: IONode seq * ?outputs: IONode seq * ?parameterValue: PropertyValue seq -> LabProcess
val degrees25: PropertyValue
type unit = Unit
property LabProcess.ExecutesProtocol: LabProtocol option with get, set
member LabProcess.AddInputMaterial: m: Material -> unit
member LabProcess.AddOutputData: d: Data -> unit
member LabProcess.AddParameterValue: pv: PropertyValue -> unit
member Dataset.AddProcess: proc: LabProcess -> unit
val child: Dataset
val qcReport: Data
val qc: LabProcess
member LabProcess.AddInputData: d: Data -> unit
val threshold: FormalParameter
val threshold95: PropertyValue
member Dataset.AddPart: child: Dataset -> unit
val qcInputAfterAttach: Data
property LabProcess.Inputs: ResizeArray<IONode> with get
union case IONode.DataNode: Data -> IONode
val d: Data
union case IONode.MaterialNode: Material -> IONode
val failwith: message: string -> 'T
val sharedDataIdentity: bool
type obj = System.Object
System.Object.ReferenceEquals(objA: obj, objB: obj) : bool
val finalNodes: string list
member Dataset.FinalNodes: unit -> ResizeArray<IONode>
module Seq from Microsoft.FSharp.Collections
val map: mapping: ('T -> 'U) -> source: 'T seq -> 'U seq
val n: IONode
member IONode.Key: unit -> string
val toList: source: 'T seq -> 'T list

Type something to start searching.