Query Source Code with Tree-sitter

TreeQuery is a new command-line tool for querying source code with Tree-sitter

Jul 02, 2021

Tree-sitter is a really cool project. It’s primarily designed for code syntax-highlighting use cases (in editors and IDEs), but it also exposes a Query API for selecting portions of a parsed syntax tree, using an S-Expression based query syntax.

Tree-sitter is a parser generator tool and an incremental parsing library.

TreeQuery is a new CLI that makes it easier to run Tree-sitter queries against local source code files. It installs as tq and looks something like this:

> tq -q some_file.go "(function_declaration name: (identifier) @method_name)"

init
handleErr
someFunc
main

(prints out all the function names in a go file).

I think this is pretty cool. We’re still exploring use-cases and getting a feel for what’s possible to query, go ahead and give it a try! It’s still rough around the edges and more language support will be added soon. See the open issues for a sense of the roadmap.

It will soon be integrated into askgit.

What Comes After Serverless: How About Codeless?

Serverless lets you forget about servers; what if you could forget about the code too?

Patrick DeVivo

Mar 31, 2021

Yes, this joke has been made before, but I’m serious. In the same way that serverless lets backend developers “forget” about servers, what if “codeless” did the same, but for your application code.

In other words, what if API requests to your service included the code that the caller wants to be executed on the server?

Your backend could essentially become a “runtime” for client-provided code. GraphQL APIs in some ways approach this - the backend is an “execution environment” for a (well defined and structured) GraphQL query. What if the “query language” became another programming language instead?

Example

A simple example is fairly easy to implement in Deno, which has built-in sandboxing features.

        
import { listenAndServe } from "https://deno.land/std@0.91.0/http/server.ts"
import { Status } from "https://deno.land/std@0.91.0/http/http_status.ts"

const HTTP_PORT = 8080;
const options = { hostname: "0.0.0.0", port: HTTP_PORT }
console.log(`HTTP server running on localhost:${HTTP_PORT}`)

listenAndServe(options, (request) => {
  if (request.method !== "GET") {
    request.respond({ status: Status.MethodNotAllowed, body: "must GET" })
    return
  }

  const start = new Date()
  const url = `https://${request.url.substring(1)}`
  let completed = false

  const p = Deno.run({
    cmd: [ "deno", "run", url ],
    stdout: "piped",
    stderr: "null"
  });

  // pipe the stdout of the process to the http response
  request.respond({ body: p.stdout })

  // whenever the process exits, mark it as done
  p.status().finally(() => {
    completed = true
    let elapsed = (new Date()).getTime() - start.getTime()
    console.log({ url, time: `${elapsed}ms` })
  })

  // once the request is done (or canceled)
  Deno.readAll(request.r).then(async () => {
    // if the process hasn't completed, end it
    if (!completed) p.kill(2)
  })

  // don't let requests run indefinitely
  setTimeout(() => {
    if (!completed) p.kill(2)
  }, 10*1000)
});

view raw main.ts hosted with ❤ by GitHub

This is just an example to illustrate the point, use at your own risk!

deno run --allow-net --allow-run --unstable main.ts

Will start an HTTP server that responds to GET requests, parses the path, and runs a Deno subprocess (deno run with no additional permissions) pointed at the remote code specified by the request path. In other words, you’ll be able to:

> curl localhost:8080/raw.githubusercontent.com/patrickdevivo/codeless/main/examples/hello_x.ts

hello, world!
hello, patrick!
hello, deno!
hello, reader!

Where the path (with an https:// preceding it) resolves to a file with the following contents (which is a Deno script):

        
const names = ["world", "patrick", "deno", "reader"]

names.forEach(n => console.log(`hello, ${n}!`))

view raw hello_x.ts hosted with ❤ by GitHub

If you don’t want to run locally, try:

curl https://codeless-deno-ex-vgetngw32a-uc.a.run.app/raw.githubusercontent.com/patrickdevivo/codeless/main/examples/hello_x.ts

Which will hit an instance of this service running in Google Cloud Run (a container based serverless platform).

Why…?

Why not? As more tools make it easier and possible to sandbox external code, maybe this isn’t such a weird (or dangerous) idea.

You only need to update your API runtime, changes to backend business logic can be made in client implementations, which means that…
Clients can call very specific versions of code since the code-to-run is described with the request (pin to a git SHA, version tag, etc.)
Clients can arbitrarily shape the output to their need (only fetch the data they care about, one of the benefits of GraphQL)
No need to agree on explicit frontend <> backend API contracts (beyond the runtime) - frontends can incrementally change the behavior of their backend calls without synchronizing with the API maintainers

Additional Ideas

Be stricter about API resource consumption (could even be based on request status: auth vs unauthed, paying vs non-paying), timeout calls after N seconds, limit memory usage - treat more like a “cloud service”
Map HTTP headers to ENV variables (or some other place) in the execution environment for contextual information like API keys or auth tokens
Enable (permissioned) access to other backend services - like a database - to use directly in the sandboxed code
Support execution in different languages (WebAssemby?)

Listing Git Authors

How can I find (unique) contributors to a git repo? Here's a way using SQL...

Patrick DeVivo

Jul 03, 2020

How can I find (unique) contributors to a git repo?

The first answer on this StackOverflow question offers:

git log --format="%an" | sort -u

Which is pretty cool. Simple and easy to understand. The git log command takes a —format flag which lets us supply a format string (where %an indicates author name).

It returns just the author name for every commit in the current history, sort -u then de-duplicates (and sorts) by line (using the piped input).

But here’s another way! And it’s a shameless plug for a project I’ve started called gitqlite, which provides a SQL interface for ad-hoc querying of git information.

gitqlite "SELECT DISTINCT author_name FROM commits"

Returns a similar output. If alphabetical ordering matters, try:

gitqlite "SELECT DISTINCT author_name FROM commits ORDER BY author_name"

Or if you want a sorting by total # of commits:

gitqlite "SELECT author_name, count(*) FROM commits GROUP BY author_name ORDER BY count(*) DESC"

That’s just the beginning! I hope to collect more use cases as I continue working on gitqlite. If you have ideas, please reach out!

Stop caring about your code

It's only the outcome that matters, not how you got there

Patrick DeVivo

Jun 19, 2020

The title of this post is extreme, and maybe clickbait-y, but I’ve come to agree with the sentiment. I think it’s made me a better developer.

A manager I know likes to describe software engineers on a spectrum, between “cowboy” and “astronaut.” Cowboys commit to master frequently, leaving a trail of bugs and TODOs. Astronauts ruminate on open PRs for weeks and over-architect for every possible (unnecessary) scenario. Good developers are somewhere in the middle. I’d say the best are the ones who can play both roles, depending on the requirements at hand.

Sometimes you just need to cowboy 🤠 some code to spike an experiment, other times you need to astronaut 🧑‍🚀 a foundational solution. The extremes are never the answer, but knowing which hat (helmet?) to wear and when is an invaluable skill.

I’d call myself astronaut-leaning, and learning to become more of a cowboy has made me a better developer, which is the sentiment of this post’s title.

Caring less about code doesn’t mean caring less about its outcome. In fact, putting the outcome before the implementation can really unlock you to think about the problems that “matter.” The value your software delivers, not the software itself.

It’s similar to the trap of building what you think your users care about vs what they actually want. No one cares about the abstractions or elegance of the code beneath your product, they care about the product.

Good abstractions, maintainability, elegance, reliability, the “behind the scenes” features of code astronauts like to overthink, can also be the outcome, if that’s what the problem calls for. They just don’t always need to be part of a particular solution.

Learning to care less about code can make you a better developer. It will make you more willing to throw code out to try new ideas, and less likely to get caught up in minutia (though, often it’s the minutia that makes our jobs fun!).

Loading more posts…

noreasontopanic

Query Source Code with Tree-sitter

TreeQuery is a new command-line tool for querying source code with Tree-sitter

What Comes After Serverless: How About Codeless?

Serverless lets you forget about servers; what if you could forget about the code too?

Example

Why…?

Additional Ideas

Listing Git Authors

How can I find (unique) contributors to a git repo? Here's a way using SQL...

Stop caring about your code

It's only the outcome that matters, not how you got there