is it allowed way to have such validation ? | The Gleam Programming Language | Page 1

torpid moss Oct 2, 2024, 4:00 PM

#

I am looking for a way when I have some untrusted source of data that structurally can be parsed.
But there might be validation rules, so record could be either valid or not.
Invalid data still can be processed: serialized, transferred or maybe even fixed to make it valid.

I came up with following solution. Is it acceptable?
Any drawbacks?

pub opaque type Validated(a, b) {
  Valid(a)
  Invalid(a, b)
}

pub type SchoolPerson {
  Teacher(name: String, subject: String)
  Student(name: String)
}

pub fn new_teacher(name, subject) {
  Teacher(name, subject)
}

pub fn new_student(name) {
  Student(name)
}

pub fn validate_person(person: SchoolPerson) -> Validated(SchoolPerson, String) {
  case person {
    Teacher(name, subject) -> {
      case name {
        "Mr" <> _ | "Ms" <> _ -> Valid(Teacher(name, subject))
        _ ->
          Invalid(
            Teacher(name, subject),
            "Teacher name must start with 'Mr' or 'Ms'",
          )
      }
    }
    Student(name) -> Valid(Student(name))
  }
}

pub fn main() {
  variant_a.new_teacher("Mr Schofield", "Physics")
  |> io.debug
  |> variant_a.validate_person
  |> io.debug

  variant_a.new_teacher("Dr Smith", "Maths")
  |> io.debug
  |> variant_a.validate_person
  |> io.debug

  variant_a.new_student("John")
  |> io.debug
  |> variant_a.validate_person
  |> io.debug
}

Teacher("Mr Schofield", "Physics")
Valid(Teacher("Mr Schofield", "Physics"))
Teacher("Dr Smith", "Maths")
Invalid(Teacher("Dr Smith", "Maths"), "Teacher name must start with 'Mr' or 'Ms'")
Student("John")
Valid(Student("John"))

lean zinc Oct 2, 2024, 4:12 PM

#

i'm sure that someone will say something relating to parse don't validate
https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

but barring that, in the same vein of what you've proposed above i wonder if it might be slightly more ergonomic to use a phantom type so you get compile-time guarantees that something has been validated

import gleam/io

pub fn main() {
  new_teacher("Mr. Glob", "History")
  |> validate_person
  |> io.debug
}

pub type Valid

pub type NotValidatedYet


pub type SchoolPerson(validity) {
  Teacher(name: String, subject: String)
  Student(name: String)
}

pub fn new_teacher(name, subject) ->  SchoolPerson(NotValidatedYet){
  Teacher(name, subject)
}

pub fn new_student(name) -> SchoolPerson(NotValidatedYet) {
  Student(name)
}

pub fn validate_person(person: SchoolPerson(NotValidatedYet)) -> Result(SchoolPerson(Valid), String) {
  case person {
    Teacher(name, subject) -> {
      case name {
        "Mr" <> _ | "Ms" <> _ -> Ok(Teacher(name, subject))
        _ ->
          Error(
                "Teacher name must start with 'Mr' or 'Ms'",
          )
      }
    }
    Student(name) -> Ok(Student(name))
  }
}

#

i'm not suggesting you do this i'm just adding to the conversation a bit and providing an alternative

torpid moss Oct 2, 2024, 4:40 PM

#

@lean zinc Can I somehow benefit on the caller side to rely on this SchoolPerson(Valid) in result?

lean zinc Oct 2, 2024, 4:42 PM

#

torpid moss <@705519805855301713> Can I somehow benefit on the caller side to rely on this `...

If you make the SchoolPerson type opaque then you can only have valid data that is constructed and validated within the library, that way on the user side you have guarantees that the data has been validated . In your user side code you can have functions only accept a SchoolPerson(Valid) so that you know anything being passed in has to have gone through the validation

torpid moss Oct 3, 2024, 6:18 PM

#

Yes, that is very interesting approach.
It works well for a single type validation. But I struggle to imagine this approach with composition of types. I'll demonstrate my actual needs:

import gleam/io
import gleam/list

pub type Valid

pub type Unchecked

pub opaque type Lat(v) {
  Lat(value: Float)
}

pub opaque type Lon(v) {
  Lon(value: Float)
}

pub opaque type Geometry(v) {
  Point(coordinates: Position(v))
  Line(coordinates: List(Position(v)))
  Polygon(coordinates: List(List(Position(v))))
}

pub fn new_lat(value: Float) -> Lat(Unchecked) {
  Lat(value)
}

pub fn new_lon(value: Float) -> Lon(Unchecked) {
  Lon(value)
}

pub fn parse_lat(lat: Lat(Unchecked)) -> Result(Lat(Valid), Lat(Unchecked)) {
  case lat.value >=. -90.0 && lat.value <=. 90.0 {
    True -> Ok(Lat(lat.value))
    False -> Error(lat)
  }
}

pub fn use_valid_lat(lat: Lat(Valid)) {
  io.debug(lat)
}

pub fn parse_lon(lon: Lon(Unchecked)) -> Result(Lon(Valid), Lon(Unchecked)) {
  case lon.value >=. -180.0 && lon.value <=. 180.0 {
    True -> Ok(Lon(lon.value))
    False -> Error(lon)
  }
}

pub fn use_valid_lon(lon: Lon(Valid)) {
  io.debug(lon)
}

pub opaque type Position(v) {
  Position(lon: Lon(v), lat: Lat(v))
}

pub fn new_position(lat: Float, lon: Float) -> Position(Unchecked) {
  Position(Lon(lon), Lat(lat))
}

pub fn parse_position(
  position: Position(Unchecked),
) -> Result(Position(Valid), Position(Unchecked)) {
  case parse_lon(position.lon), parse_lat(position.lat) {
    Ok(lon), Ok(lat) -> Ok(Position(lon, lat))
    _, _ -> Error(Position(position.lon, position.lat))
  }
}

pub fn new_point(lon: Float, lat: Float) -> Geometry(Unchecked) {
  Point(Position(Lon(lon), Lat(lat)))
}

pub fn new_line(positions: List(#(Float, Float))) -> Geometry(Unchecked) {
  positions
  |> list.map(fn(x) { Position(Lon(x.0), Lat(x.1)) })
  |> Line
}

#

So I can parse and mantain validity of lat and lon. But if we speak about Position it already starts to be complicated. Position(Valid) can only hold both valid lat and lon and that is fare. But in oposite you'll have to reset both lat and lon to uncheked if either of it is not valid. And it still approach that might have sense. But in reality my type allows recursive inclusions. So I don't believe that scales.

#1290981460857192561 message

lean zinc Oct 3, 2024, 6:22 PM

#

conversely to what i suggested earlier (i was using your original post as a base for that one), what i would likely do in reality is have functions like new_position return Result(Position, Something) and do whatever validation steps there, that way you can only ever have a Position that you know is good

#

it's much less fancy but much more straightforward

#

this to me is where the idea of making impossible states unrepresentable really comes into play, if you can guarantee that you never construct an invalid Position, than you can be a lot more flexible with how you operate with them

#

essentially just moving the logic from parse_position into new_position

pub fn new_position(lat: Float, lon: Float) -> Result(Position, Nil) {
  case parse_lon(lon), parse_lat(lat) {
    Ok(lon), Ok(lat) -> Ok(Position(lon, lat))
    _, _ -> Error(Nil)
  }
}

#

this way of doing things is much more in keeping with gleam's philosophy of simplicity imo

torpid moss Oct 3, 2024, 6:29 PM

#

The problem is that impossible state is represantable in external world. And we can offer functions to fix them. Even with valid positions we can build not closed polygon or geometry that crosses antimeridian rendered in a garbish way, hence being "valid" structurally.

#

We can be more strict on production values, but we need to represent parsed values as well

#

The problem is that impossible state is represantable in external world
It is correctly said "unwanted state is represantable in external world" but it stil possible

lean zinc Oct 3, 2024, 6:35 PM

#

you may need input from someone more clever than i am 😅

my default is to leverage Result for anything that needs to be checked or can result in something with an invalid state, for example your new_line function i would change to something along the lines of the following to guarantee that no invalid Geometry is ever returned to the user

pub fn new_line(positions: List(Position)) -> Result(Geometry, Nil) {
  let line = line(positions)
  ... some line validation function/logic that returns a Result
}

torpid moss Oct 3, 2024, 7:08 PM

#

Not a problem. I am still receiving a valuable input and probably our discussion will be noted by someone who are ready to provide his expertise.

Just as a reminder we have two directions to solve problems here.
One is constructing values, another is parsing and representing existing data.
Let's review them separately.
Constructing. Let's define API

pub fn new_lat(value: Float) -> Result(Lat, OutOfRange)
pub fn new_lon(value: Float) -> Result(Lon, OutOfRange)
pub fn new_position_2d(lon: Float, lat: Float) -> Result(Position, OuOfRange) // uses new_lat,new_lon
pub fn point(position: Position) -> Result(GeoJSON(Geometry, Nil), InvalidGeoJSON)
pub fn polygon(positions: List(Position)) -> Result(GeoJSON(Geometry, Nil), InvalidGeoJSON)

Now let's try to use it:

fn build() {
  use pos0 <- result.try(new_position_2d(0.0, 0.0))
  use pos1 <- result.try(new_position_2d(10.0, 0.0))
  use pos2 <- result.try(new_position_2d(10.0, 10.0))
  use pos3 <- result.try(new_position_2d(0.0, 10.0))
  use polygon <- result.polygon([pos0, pos1, pos2, pos3])
}
build() // Error(UnclosedPolygon)

And that is a very simply figure that introduce huge callback nesting we don't see thanks to use.
If there is a better way to consume this API and I am just unaware of it, just let me know.

#is it allowed way to have such validation ?