More

jierenchen · on Feb 17, 2021

Great post.

I see two big challenges.

1. How do you do this without changing the way we program? The most amount of value I would get from this kind of tool is being able to spelunk in a huge codebase with lots of legacy gunk. The cost of entry for this cannot be "rewrite all the codes."

2. What does the query language look like and how do you expose this to the user? This is especially challenging because the query language needs to serve so many different roles.

I'm excited to see what comes of this as I see this problem as a blocker for the next generation of software development. We've more or less solved infrastructure, yet taming complexity still eludes us.

Shameless plug, with a bit of fortuitous timing:

I just released an alpha CLI for my project SourceScape.

It's a tool that indexes your Typescript and Javascript code (Ruby coming soon.) You can then query your code by code structure instead of just raw text. As a trivial example, you can search for all classes with a render method that returns a jsx element with name div.

Install instructions here: https://github.com/sourcescapeio/sourcescape-cli#install

Marketing materials (a bit outdated, but looks pretty): https://sourcescape.io

tabtab · on Feb 18, 2021

Re: "How do you do this without changing the way we program?"

Let's change the way we program. If we outgrow trees, we've outgrown trees. I don't know if existing code will be easy to convert. It's kind of like when OOP IDE's fell out of favor for web stacks: the old OOP classes couldn't be reshaped for web because the web is state-poor and OOP is state-rich. We had to dump OOP libraries and stacks into the trash.

Re: "What does the query language look like and how do you expose this to the user? This is especially challenging because the query language needs to serve so many different roles."

I suggest using existing RDBMS and SQL. There's no reason to reinvent the wheel unless inherent flaws can be found in RDBMS for code management purposes. I haven't found any yet in my little experiments in "table oriented programming".

I take that back a bit. Different UI components/widgets will have different attribute structures. It's hard to change the schema every time a new widget type is added. Possible solutions include Dynamic Relational, or a Windows-Registry-like "attribute tree" for UI models. Attributes of UI widgets then could be accessed like:

     setAttrib("screenX.widgetY.maxHeight", 37);
     // context-based shortcut:
     setAttrib("currentWidget.maxHeight", 37);

The second form is handy because instead of writing explicit loops, event triggers may be used traverse and customize per-widget behavior. The traversal mechanism will create a reference to the "current" item per behind-the-scenes looping to simplify path references. More samples:

     // sequence control (display order):
     setAttrib("screenX.widgetY.sequence", 47.5);
     // deactivate:
     setAttrib("screenX.widgetY.active", false);
     // alternative:
     setAttrib("screenX.widgetY.status", "inactive");
     // more uses of "status" attrib:
     setAttrib("screenX.widgetY.status", "hidden");
     // Grid samples:
     setAttrib("screenX.grid4.row.7.col.9.value", "foo");
     setAttrib("screenX.grid4.row.7.backGroundColor", "green");

Helper API's would probably simplify grid path management.

jierenchen · on Feb 23, 2021

I'm all for a rewrite of existing systems, but I think that's going to be a long-term thing (5-10 years) that will happen in parallel to developments in treating code as a database.

I've found RDBMS is quite limited in terms of graph traversal. Maybe you've found a way around this? Would be happy to riff on that.

jierenchen · on Sept 16, 2020

Hey there HN, I'm Jieren, and I'm building this app here.

I submitted this two months ago and I wanted to share again as there's been some significant progress. Instead of a fairly janky interface where you're clicking and dragging things around, now it's just a text editor where you type in code and you get search results that mirror this code.

All this is driving towards a vision of a "no-code" [0] builder for static analysis and making static analysis as easy as code search.

Looking forward to your feedback!

jieren at sourcescape.io @jierenchen

[0] Extreme air-quotes because you're literally searching with code blocks

jierenchen · on Sept 16, 2020

Hey there HN,

I'm Jieren, and I'm building this app here.

I submitted this two months ago and I wanted to share again as there's been some significant progress. Instead of a fairly janky interface where you're clicking and dragging things around, now it's just a text editor where you type in code and you get search results that mirror this code.

All this is driving towards a vision of a "no-code" [0] builder for static analysis and making static analysis as easy as code search.

Looking forward to your feedback!

jieren at sourcescape.io @jierenchen

[0] Extreme air-quotes because you're literally searching with code blocks

jierenchen · on July 19, 2020

Same reaction here: "How is this a problem?"

I think it's in the nature of big companies to view everything as a process optimization problem and losing out on "innovators" is viewed as a gap in optimality that needs to be filled. An environment tuned to value extraction is inherently going to be hostile to exploration. Can't have your cake and eat it too, buddy.

jierenchen · on July 19, 2020

This is very cool! It's really awesome to see this code as data concept gaining a lot of traction recently. Hope to see this project developed further.

I'm working on a similar project here: https://sourcescape.io/, but intended for use outside the IDE on larger collections of code (like the codebase of a large company.)

Agreed on the Prolog/Datalog approach of expressing a query as a collection of facts. CodeQL does the same. From one datastore nerd to another, I actually think this is a relatively unexplored area of querying knowledge graphs (code being a very complex, dense knowledge graph.)

Very excited to see where you go next with this "percolate search" functionality in the IDE.

geordimort · on July 19, 2020

There’s a lot of research on this area. The fact that Semmle was acquired by Github/Microsoft is a testament to the maturity of the field.

jierenchen · on July 17, 2020

Back up now. Was it a 502 or did it lock up?

jierenchen · on July 16, 2020

Hi there HN,

I'm Jieren, creator of SourceScape. SourceScape is a query engine for source code that lets you build up constraints for the code you want to see, much like GraphQL in a GUI. It's a no-code builder for static analysis.

Static analysis is a powerful tool for understanding and verifying code. I want to make writing static code analysis as easy as code search so that you'll be using it all the time. Instead of creating migration spreadsheets, you'll just write queries. Instead of doing nitpicks for code reviews, you'll just write queries.

Happy to answer any questions you have! Also, any feedback would be much appreciated.

jieren at sourcescape.io @jierenchen

cthonicthulu · on July 17, 2020

This is pretty neat -- is there any way to extend this to tree transformations to make this a tool for refactoring?

jierenchen · on July 17, 2020

Thanks!

Yes, that's part of the ultimate vision. The challenge will be figuring out how to represent these transformations.

jierenchen · on July 6, 2020

Figuring out how to fill the gap between code search and static analysis (code checks.)

Right now the tools we have for programmatically reading through code are: 1. Code search, which is fast, but inaccurate/heuristic. 2. Static analysis, which is slow to run and difficult to write, but very accurate.

I'm building a tool that is as fast and easy to use as code search, and is as accurate and expressive as static analysis.

Still just a landing page [0]. Looking to get a public playground people can mess around with this week.

[0] https://sourcescape.io/

jierenchen · on July 3, 2020

Hey there HN,

I'm Jieren, creator of SourceScape. SourceScape is a query engine for source code that lets you build up constraints for the code you want to see, much like a SQL query. You can also think of it as a very fast no-code builder for static analysis checks.

Throughout my career, I've always felt that there was this gap in our ability to dig through source code quickly. Text search is good, but very unreliable. Static analysis takes too long to write. I wanted something that could, for example, find all `.create calls from any instance of UserServiceClient` with 100% accuracy, and quickly.

Earlier this year, I was working on a microservices logging migration that took way longer than it should have. The frustrations I encountered on that project around coordination and verification of the migration became the impetus for me to build SourceScape.

Would love to get your thoughts on this.

jierenchen · on May 22, 2020

We should draw a distinction between FP as religion and FP as tool kit.

FP as religion has failed to gain acceptance because it imposes too much cost on the user. I have to rethink my whole stack in terms of category theory AND deal with your terrible ecosystem? Hard pass.

FP as toolkit, on the other hand, has been a smashing success. Most of the core ideas of FP are mainstream now and some of the latest advances in non-FP ecosystems (React, for example) are based on FP ideas.

eru · on May 22, 2020

FP ideas have been seeping into the mainstream for quite some time now.

The oldest: GC was originally invented for Lisp. It's common now.

Type inference was big in FP before it made the jump to language like Java or C++ much more recently.

Generics were a natural idea in a typed FP context. Mainstream languages got them, now.

I'm looking forward to algebraic data types becoming really common. (The simplest explanation is that they are C-style unions with tags to tell you which case you are in. The compiler enforces that the tags correspond to how you use them.) Some mainstream languages are starting to add them.

jierenchen · on May 23, 2020

> The oldest: GC was originally invented for Lisp. It's common now.

LISP was so ahead of its time, its parents still haven't met yet. Not very FP, but another gem from SBCL: saving and restoring program state for later use. Now there's the CRIU [0] project for doing this with Linux and Docker containers.

> I'm looking forward to algebraic data types becoming really common.

I'm not too familiar with the full scope of algebraic data types. Wondering: does Typescript have this or is it still missing a few key components? Really like how it has Union types, which I wish Scala would have.

[0] https://www.youtube.com/watch?v=LrHW7Vvbie4

eru · on May 23, 2020

Algebraic data structures mostly just means union types.

(That's the + in the algebra. The * comes from bundling multiple values together, like in a tuple or in a C-style record, virtually all languages already have that.)

There's also Generalized Algebraic Datatypes (GADT). They are a bit more complicated. So I don't expect mainstream languages to pick them up anytime soon.

About GC: you _can_ do pure functional programming without a GC. But it requires lots of big guns from more advanced theory. (Mostly stuff like linear typing.) However imperative programming without a GC is comparatively simple.

So it's no wonder that historically, GC was invented for FP first, and GC-free FP was only discovered later.

(And for general CRUD or web programming, or basically anything outside of low level systems programming, GC is more productive in terms of programmer time than other approaches. At least given currently known techniques.)

StreamBright · on May 22, 2020

Exactly. You can use FP ideas in pretty much any language (and most people do who like reliability). People who do not see value in immutability do all sorts of tricks to avoid the pitfalls of sharing mutable state. One example is in Java design patterns when they recommend creating a copy of the object you are handling. I can't remember the exact name of this pattern, but it is kind of funny.

jierenchen · on May 23, 2020

Or the visitor pattern, which is.... the map function.

One thing I will note from recent experience: programming for a long time with immutability cripples your ability to think about mutable things. I was writing up a streaming merge sort this week and it was a brutal nightmare because of all the state I had to deal with. Seems like a call to action to deal with a bit of mutable state every now and then. Everything is too pure these days. We're programmers, not mathematicians dammit.