Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

PRQL is a similar idea, and it compiles to SQL: https://prql-lang.org/

    from invoices
    filter invoice_date >= @1970-01-16
    derive {
      transaction_fees = 0.8,
      income = total - transaction_fees
    }
    filter income > 1


SQL pipe syntax is a great step in the right direction but what I think sets PRQL apart (in my very biased opinion, contributor) is the ability to define functions.

Here's a simple example:

    # define the "take_smallest" function
    let take_smallest = func n col tbl<relation> -> (
     from tbl
     sort col
     take n
     )
    
    # find smallest 3 tracks by milliseconds
    from tracks
    take_smallest 3 milliseconds
You can try this now in the online playground: https://prql-lang.org/playground/

That's simple enough and there's not that much gained there but say you now want to find the 3 smallest tracks per album by bytes?

That's really simple in PRQL and you can just reuse the "take_smallest" function and pass a different column name as an argument:

    from tracks
    group album_id (
     take_smallest 3 bytes
     )


Here's a more complex example for sessionizing user events:

    # specify the target SQL dialect
    prql target:sql.duckdb
    
    # define the "sessionize" function
    let sessionize = func user_col date_col max_gap:365 tbl<relation> -> (  
      from tbl
      group user_col (
        window rows:-1..0 (
          sort date_col
          derive prev_date=(lag 1 (date_col|as date))
          )
      )
      derive {
        date_diff = (date_col|as date) - prev_date,
        is_new_session = case [date_diff > max_gap  || prev_date==null => 1, true => 0],
        }
      window rows:..0 (
        group user_col (
          sort {date_col}
          derive user_session_id = (sum is_new_session)
        )
        sort {user_col, date_col}
        derive global_session_id = (sum is_new_session)
      )
      select !{prev_date, date_diff, is_new_session}
    )
    
    # main query
    from invoices
    select {customer_id, invoice_date}
    sessionize customer_id invoice_date max_gap:365
    sort {customer_id, invoice_date}
You can also try that in the playground: https://prql-lang.org/playground/


SQL with pipe syntax can also do functions like that. An sequence of pipe operators can be saved as a table-valued function, and then reused by invoking in other queries with CALL.

Example:

  CREATE TEMP TABLE FUNCTION ExtendDates(input ANY TABLE, num_days INT64)
  AS
  FROM input
  |> EXTEND date AS original_date
  |> EXTEND max(date) OVER () AS max_date
  |> JOIN UNNEST(generate_array(0, num_days - 1)) diff_days
  |> SET date = date_add(date, INTERVAL diff_days DAY)
  |> WHERE date <= max_date
  |> SELECT * EXCEPT (max_date, diff_days);

  FROM Orders
  |> RENAME o_orderdate AS date, o_custkey AS user_id
  |> CALL ExtendDates(7)
  |> LIMIT 10;
from the script here: https://github.com/google/zetasql/blob/master/zetasql/exampl...


Oh nice! Thanks for setting that straight. Apologies, I must have either missed that or forgotten about it.


BigQuery has table-valued functions already, which can be used with pipes with a CALL clause.


I'm tempted to give it a try just for the "For HackerNews enthusiasts" section of their landing page, which states "The PRQL compiler is written in Rust" and "We talk about “orthogonal language features” a lot". Love me some banter in technical documentation :D


I love the design behind PRQL, was a little dissapointed when I tried using it with an MSSQL server and found there was no decent way to filter for `LIKE '%something%'`.

(PRQL uses regex for stuff like this, which sounds great but then isn't supported on some databases like MSSQL)


I'm very surprised to learn that PRQL does not natively support `like`, but you can add it yourself: https://github.com/PRQL/prql/issues/1123#issuecomment-135385...


Huh, it's super cool they have a built-in way of extending to support missing engine features!


Oh man that's awesome!


I think I like this new pipe syntax a lot better than PRQL. Feels like it has the potential to become officially supported by DBs like MySQL since it's a relatively small change to existing SQL syntax.


IIRC the SQL pipe google paper references PRQL. I think both make sense in their own context. SQL pipe is a smaller leap for users familiar with SQL and prql sort of takes on more change. I do wonder if the smaller leap might cause more deja-vu displacement initially than a bigger shift like PRQL. I don't know the answer other than look at users of both over time and see how they do with the different syntaxes.


there's an duckdb extension[0] for it too

0 - https://duckdb.org/community_extensions/extensions/prql.html


“Malloy” is one one in the same spirit, but probably more comprehensive https://github.com/malloydata/malloy


In general in languages like PRQL / Pipe syntax, though conciseness is a big advantage, do they help in query optimisation in any regard ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: