Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> My gut tells me that's not quite right (for some reasonable definition of 'usable'), since we can always build elimination forms into the language (after all, 'null' must be built in, and few take objection to building in elimination forms like if/then/else).

True, you could go the Go array route and have Optional be a special type that is generic, without otherwise supporting generic data structures.

> I really like comprehensions in Python, but hardly ever use them elsewhere.

I think code like possiblyMissing.map(value => code) introduces lots of unpleasant imbrication, and breaking out functions for every possibly-missing value also seems to me to lead to bad readability. I suppose this may be a matter of taste, or it may be related to other language features.

> just make sure to distinguish them from other string-like types, in a way that's statically enforceable, and where the only API to convert a 'String' to an 'SQL' is the escaping function.

I think this is the part that doesn't really work, because, in order to escape a piece of user input, you need to understand what it's meant to be. For example, a common (unsafe) idiom is:

  sqlFormatString = "SELECT %s FROM %s WHERE %s;";
  filterFormatString = "%s LIKE ?";
  filterString = sprintf(filterFormatString, userInputColumnName);
  sqlQuery = sprintf(sqlFormatString, columnName, tableName, filterString);
  preparedQuery = dbImpl.prepare(sqlQuery, userInputValue);
Which of course does the right thing for userInputValue; may or may not be doing the right thing with columnName and tableName, depending on the source; and does the really wrong thing with userInputColumn.

Now, you could replace this with a type safe dbImpl.prepare(), which only takes a NotUserGeneratedSQL type. But now you need to have some way to build a NotUserGeneratedSQL.

One option is to go all in on an SQL AST library, where you could build the query above like

  query = selectQuery();
  query.columnNames = [ escapeColName(columnName) ];
  query.from = escapeTableName(tableName); 
  queryFilter = equalityComparison();
  queryFilter.left = escapeColName(userInputColumn);
  queryFilter.right = preparedArg();
  query.filter = queryFilter;

  preparedQuery = dbImpl.prepare(query, userInputValue)
But this, as I said, gets tedious for the user and complex for the implementer.

However, I don't think there is any alternative that can actually enforce validation of user input in constructed queries. You can help users think about it by forcing them to use some kind of string -> NotUserGeneratedSQL function, but this function can't actually be written to reject the example I gave initially. A function that creates an empty minimal NotUserGeneratedSQL and offers other functions to build it up to a useful query will either accept strings (making it possible to introduce SQL injection) or have to accept structured SQL, making it fall into the AST problem.

Of course, the problem of NotUserGeneratedSQL is easy if you don't want to support any kind of dynamic query, other than prepared statements. But if you want to dynamically generate queries based on user input (e.g. user chooses which columns they want to see, which columns they want to filter by etc.), then I don't think it's possible to statically ensure that SQL injection is not possible without an ORM or SQL AST API.

Edit: if you believe otherwise, please show me a type-safe, SQL-injection-safe library in any language that isn't an ORM/EDSL or a straight up SQL AST manipulator. I am quite certain none exists, but I would be happy to be proven wrong.



What I had in mind was something along these lines:

    allowedColumns: Map[UserInput, SQL] = {
      "name": "name",
      "age": "age",
      ...
    }

    strict: SQL = strictComparison? "=" : "LIKE"
    
    query: Maybe[SQL] = allowedColumns
      .get(userCol)
      .map(col => "SELECT " + col + " FROM tbl WHERE tbl.foo " + strict + addParameter(":?", userFoo))
In particular:

- We can write literals which look like strings but have type SQL

- We can append fragments of SQL together (potentially making it invalid; oh well)

- We can include user input via parameterised queries, in locations where arbitrary strings/ints/etc. are allowed

- Anything 'structural', like identifiers, choice of comparison operations, building up sub-expressions, etc. must be done programatically, using the above features. In this case we select the column name (written as a static, literal SQL value) by looking up the user's input in a map. We also allow choosing between the type of comparison to use (again, both are SQL literals).

It seems to me that requirements like 'user chooses which columns they want to see, which columns they want to filter by etc.' is fundamentally incompatible with a safe, string-like representation. Instead, the options are:

- Fully representing the structure of the language. This results in an AST approach, which allows safe dynamic queries. I think any alternative, like tracking the offset of each delimiter in a string, etc. will turn out to be equivalent to maintaining an AST.

- A "flat", string-like representation, whose dynamism is limited to choosing between some combination of pre-supplied fragments. This is what I've shown above. This is safe, but the 'dynamism' is inherently limited up-front (i.e. it's overly conservative).

- A "flat", string-like representation, which has unrestricted dynamism, but hence is also inherently unsafe (i.e. it's overly liberal).


> - We can write literals which look like strings but have type SQL

What I'm not clear is: what prevents me from accidentally/stupidly doing:

  filter : SQL = "WHERE " + userInputCol + " = ?"
Is this special string handling some compiler magic that distinguishes literal strings from string variables? If so then I think that in a language that supports something like this you can indeed make a safe library. The main downside is that you need to work entirely with compile-time constructs - e.g. you can't use something like printf to take a compile-time format string and turn it runtime into a query; and you can't take queries from a separate file, they must be in source code. But these may be acceptable trade-offs.

Do you know of any library that implements this?


> Is this special string handling some compiler magic that distinguishes literal strings from string variables?

Ah, maybe I should have made it clearer that I was overloading the double-quote syntax, so we can write:

    "foo": String
    "bar": SQL
    "baz": UserInput
    "quux": Shell
    etc.
I was also relying on type inference to figure out which is which, and on '+' returning the same type as both arguments, e.g.

    +: String  -> String  -> String
    +: SQL     -> SQL     -> SQL
    +: Int     -> Int     -> Int
    +: Float   -> Float   -> Float
    +: List[T] -> List[T] -> List[T]
    etc.
This way, we see how your example fails to typecheck:

    // Code as written
    filter : SQL = "WHERE " + userInputCol + " = ?"

    // Right-hand-side must have type SQL, to match left-hand-side
    "WHERE " + userInputCol + " = ?": SQL

    // Resolving order-of-operations of the two '+' operations
    ("WHERE " + userInputCol) + " = ?" : SQL
  
    // Arguments to outer '+' have same type as return value, which is SQL
    "WHERE " + userInputCol : SQL
    " = ?" : SQL

    // Arguments to inner '+' have same type as return value, which is SQL
    "WHERE " : SQL
    userInputCol : SQL
We've inferred that userInputCol must have type SQL, so it will fail for String/UserInput/whatever.

> The main downside is that you need to work entirely with compile-time constructs - e.g. you can't use something like printf to take a compile-time format string and turn it runtime into a query; and you can't take queries from a separate file, they must be in source code.

Yep, although macros could help with that sort of thing, e.g. Haskell's quasiquotation https://wiki.haskell.org/Quasiquotation

A couple of Google hits for 'haskell quasiquote sql':

https://hackage.haskell.org/package/postgresql-simple-0.6.2/...

https://hackage.haskell.org/package/postgresql-query

> Do you know of any library that implements this?

Not completely. De-coupling double-quoted literal syntax from a single String type can be done with Haskell's OverloadedStrings feature, but that relies on a function 'fromString : String -> t', which is what we're trying to avoid https://hackage.haskell.org/package/base-4.6.0.1/docs/Data-S...

Scala's custom interpolators are similar, but they rely on a function from 'StringContext -> t', and StringContext is easily created from a String ( https://www.scala-lang.org/api/current/scala/StringContext.h... )

To ensure safety, we would need some way to encapsulate the underlying fromString/StringContext implementation to prevent it being called by anything other than the literal-expansion at compile-time.

Of course, if we're willing to use macros then it's pretty easy, like those quasiquote examples above.

Haskell's module system is famously mediocre, so it might be possible to do this overloading + encapsulation with Idris https://idris2.readthedocs.io/en/latest/reference/overloaded...

(Of course, Idris also has an incredibly expressive type system, and a very powerful macro system, AKA "elaborator reflection", so it can definitely be done; but I haven't figured out the cleanest way)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: