Data Structure Manipulations - 1

Process modelling - especially of minerals systems - is mostly about manipulating data in arrays.

This is an exercise in using a few APL symbols (or operators) to push numbers (or characters) around into different structures.

In APL the operator "rho" ( Alt + R ) can be used to either interogate the "shape" of a data variable, or with specification, reshape data to what we want it to be.

In APL, " A gets 1 2 3 4 " creates a vector. Imagining our variable A sitting in sheet1 of Excel. It has four values in the first row, one in each of the columns 1 to 4. So in Excel the data range is R1..C4. But now we see how APL and Excel differ. In Excel when we look at the data we also see all the other available rows and columns, and even the other sheets. We can't tell if A is an array (which could have multiple rows and columns) or maybe it can occupy other other sheets as well. The data in Excel doesn't have shape or structure - it simply has elements occupying some cells.

In APL every variable has structure as well as (optionally) data elements. Our variable A is a vector. It does not have rows. As APLers we would prefer its Excel range to be C1..C4, and to have all the other rows (and the other sheets) greyed out as not available, at least not for A.

Our vector A can "be extended along the direction of increasing column count" - because that is what vectors can do - but nothing else.

The APL description of the shape of A ( " rho A " ) is a vector with, in this case, one element - the number 4. It describes the shape with a vector, as it may need more numbers to show the length of other dimensions (or axes) - if there were rows, planes etc.. It doesn't need to say "4 columns" as "columns" is the basic building block for data structure, and if there are any "higher" dimensions they would be added earlier (starting from the left) in the "shape vector".

So the APL "shape" vector is completely unambiguous definition of variable structure - as a vector of numbers, with each number indicating that a dimension (or axis for extension) exists, and the current length of that axis (or the number of elements of that axis that are available for now).

Understanding how APL variables have shape, and being able to use "rho" to interogate the shape of a variable is the first step to being able to use data from variables and to build variables with the required structure for specific modelling requirements.

Cheers, Jim