more nephrenka's comments

nephrenka · on Aug 6, 2021

That's a great visual presentation, but not really an innovation. The CodeScene tool has that built in together with a set of deep analyses on top (see https://codescene.com/).

There are several public conference talks that cover this visualization and related use cases: https://www.youtube.com/watch?v=fl4aZ2KXBsQ

nephrenka · on Nov 24, 2020

Fully agree! That's why I recommend to monitor the code health trends continuously -- the earlier we can catch any potential issues, the better: https://codescene.io/projects/167/jobs/55946/results/warning...

We have these code health trends supervised as part of our pull requests and use them as (soft) quality gates. [1]

[1] https://codescene.com/blog/measure-code-health-of-your-codeb...

nephrenka · on Nov 24, 2020

Yes, there's an on-prem version of CodeScene that can be run on private servers. The latest release is described here: https://codescene.com/blog/architectural-analyses-simplified...

To analyse Perforce, you setup an automated conversion to a read-only Git repo that you then point the analysis to.

nephrenka · on Sept 20, 2019

CodeScene isn't built on the open source tool. CodeScene's engine is implemented from scratch in order to deal with large-scale codebases. The basic metrics are the same, but CodeScene adds plenty of information and pattern detection on top of them. If you're interested, the story of CodeScene is written down here: https://empear.com/blog/happy-birthday-3-years/

nephrenka · on Feb 28, 2019

A large codebase under active development presents a moving target; Even if you knew how something worked last week, that code might have changed twice since then. Detailed knowledge in the solution domain gets outdated fast.

To address this issues, I work with something I call a behavioral code analysis. In a behavioral code analysis, you prioritize the code based on its relative importance and the likelihood that you will have to work with it and, hence, needs to understand that part. Behavioral code analysis is based on data from how the organization works with the code, and I use version-control data (e.g. Git) as the primary data source. More specifically, I look to identify hotspots. A hotspot is complicated code that the organization has to work with often. So it's a combination of static properties of the code (complexity, dependencies, abstraction levels, etc) and -- more important -- a temporal dimension like change frequency (how often do you need to modify the code?) and evolutionary trends.

I have found that identifying and visualizing hotspots speeds up my on-boarding time significantly as I can focus my learning on the parts of the code that are likely to be central to the solution. In addition, a hotspot visualization provides a mental map that makes it easier to mentally fit the codebase into our head.

There are a set of public examples and showcases based on the CodeScene tool here: https://codescene.io/showcase

I have an article that explains hotspots and behavioral code analysis in more depth here: https://empear.com/blog/prioritize-technical-debt/

I also have a book, Software Design X-Rays: Fix Technical Debt with Behavioral Code Analysis, that goes into more details and use cases that you might find useful for working with large codebases: https://pragprog.com/book/atevol/software-design-x-rays

thecupisblue · on Feb 28, 2019

Nice! I'm working on internal tooling for us that does a lot of the same things - gonna buy the book, thanks for that, weird I've never heard about it. For now I'm measuring: churn, complexity, linting, test coverage, test quality and am going to add a dependency graph. It seems to me that churn, complexity and dependencies are the biggest indicators of a hotspot.

Got any tips for possible problems I'll encounter along the way?

nephrenka · on Feb 28, 2019

Cool - thanks! While the measures a simple in theory, there are some practical challenges; git repositories tend to be messy. So part of the practical challenge is to clean the input data (e.g. filter out auto-generated content, checked in third party libraries).

Another challenge is that version-control data is quite file centric, while many actionable insights are on a higher architectural level. In CodeScene we solve this by aggregating files into logical components that can then be presented and scored.

thecupisblue · on Feb 28, 2019

I'm already onto data cleanup, tbh I'm focusing on Android repos for now. But great idea, now that I think of it as an architect I'd mostly like the option to group a few classes/packages or let's call them "modules" together and then visually see which pieces are too dependent on outside sources and which are the hotspots/connections/dependencies inside that group.

There goes my weekend...

skillet-thief · on Feb 28, 2019

What is nice about this approach is that it gives you an instant map of the codebase (where the important parts are, etc.) that you really just can't get any other way.