Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From W3Tech;

> PHP is used by 77% of all the websites whose server-side programming language we know.

I had a quick look at the methodology section, but it’s not clear to me how accurate this data is. Determining whether a site uses PHP can be relatively straightforward (especially with default extensions / if Wordpress is used / etc), but if a site (potentially using a different language) is behind a reverse proxy/uses an API/etc then it is less clear. Does anyone know whether PHP is over-represented in the results because it’s easy to identify?

No doubt PHP is still huge, but 77% seems almost too huge. There is also a very good chance that PHP is actually that big and I’m just in a different crowd.



Saying 77% of the web is run by PHP and concluding therefore that PHP is well-liked for websites is like telling most of banking is run in COBOL and therefore COBOL is well-liked for banking.

The conclusion has no coherence to the source.

I guess that most of the web runs on PHP (because it runs on Wordpress) if counted by page-view. But I'm not sure that's the proper measure.


In default configuration PHP identifies itself in response headers, but you can turn that off. People who know how to set up and administer web back ends generally do that from a security best practice checklist. So if anything PHP runs more sites than the number discovered crawling the web looking for identifying evidence.

Wordpress and similar CMSs, e-commerce platforms, and the Laravel framework give themselves away in other ways that take more work to hide. Likewise non-PHP back-ends have easy to see fingerprints in the headers and page source. There’s no reason to think PHP appears to dominate because crawlers can easily identify PHP but can’t identify other back-ends.

Banks and large companies with large COBOL code bases mainly maintain their legacy code, they don’t write a lot of new code in COBOL or RPG. And you won’t find COBOL powering web sites, which makes the argument irrelevant to the topic. The last time I worked on a big legacy back-end I was putting a web front end on it, using PHP and ColdFusion.


I agree. I always doubted these figures (I'm a PHP dev myself, so I wouldn't mind these figures being true). I think the methodology is shady. I wonder if they use what the server indicates. I think some servers like Apache with php mod send this information to the client in a header. But most servers don't. Therefore they maybe use this as "from all the servers giving a backend language information, PHP represents 77%" which wouldn't be surprising. The question is how many websites in your data don't give any information about the language used under the hood?

I think we should stop using these numbers. GitHub uses ruby on rails but we know it from the developer team, not from what the server tells us. How many websites communicate about their backend infrastructure?

I don't doubt Wordpress powers many websites out there. But I'm tired of these figures which don't mean anything to me. Especially that if you look for all job ads, PHP isn't so big (except in some PHP-centric countries like France).

You can't just make up numbers. If you give me statistics, give me the methodology you used and all the details. Otherwise I suggest we all start saying Haskell powers 87% of the web. After all, if you can invent what suits you, I can do the same.


Identifying the technologies behind a web site involves a lot more than looking at Apache's mod_php headers (which you can and should turn off for security reasons). The tools for figuring out what runs a site actually do a really good job by looking for multiple identifying features. Marketers and SEO people use tools like BuiltWith and Wappalyzer (and many others, most of them not free). You may not know about those tools or how well they work, but a quick browse of that space will disabuse you of the idea that these surveys just crawl looking at server headers.

Multiple independent surveys of web back-end technologies by different outlets, across many years, have reached the same conclusion: PHP powers approximately 3/4ths of public web sites/applications. I do a lot of PHP work and I see PHP used heavily in restricted/private web applications as well -- internal sites that won't show in these kinds of surveys. One school I work for has one public WordPress-powered site and several internal-only WordPress sites, and multiple internal PHP-powered sites not based on WordPress, including Moodle (learning management system) and their student management system.

The large ecosystem, relatively large population of experienced developers, and ease of deployment play into the decision process. Sometimes it comes down to hosting costs or other non-technical factors.

Deducing the number of jobs for PHP developers based on job ads will mislead you. Most jobs get filled internally, informally, or by recruiters before they get posted online (because that costs). If you don't see a lot of ads for PHP developers that might mean few jobs exists (which wouldn't match the experience of anyone who works with PHP). It may also mean the jobs got filled before the employer has to pay to advertise the job. A position for a 5+ yrs experience Elixir dev may sit open for months, but I can and have filled PHP dev openings in a few days, from a large list of applicants acquired by a free posting in a local PHP user's group forum, without having to post in public job forums or do LinkedIn email blasts.

We should also consider that web developers with more than a few years of experience have likely worked with multiple tech stacks, and those of us with 10+ years very likely cut our teeth on PHP. I started in the '90s with ASP and ColdFusion, with some Perl, and then saw employers move to PHP (and a few to Rails a few years later) mainly because ASP (which predates .NET) and ColdFusion required increasingly expensive licenses whereas PHP did not. Among experienced web developers you will find many/most of them have worked with PHP, and could work with it again, though they may prefer something else. Likewise I know COBOL and could fall back on that if more interesting work dried up for me, but I don't call myself a COBOL developer or look for jobs in that space.


Thank you for your comment. I was definitely wrong about the different methods used. This is why I love Hacker News. Always nice to learn something.

I still think the stats they provide are a bit weird since there is no "unknown" category. If they can't find the backend technology used for 5% of websites, it changes the whole result, and from what I have seen they don't provide this information.

But your really nice and detailed answer tells me I might be wrong once more.


I don’t see an “unknown” category. A large number of unknowns could skew the results if we had reason to believe those mostly represent non-PHP sites. Do we have any reason to think those unknown sites show a different distribution than the known sites? Do enough unknown sites exist to meaningfully affect the results?

Using your 5% example, supposing that 5% unknown includes no PHP sites, that only brings the PHP percentage down a little. It doesn’t change the main point that PHP dominates by a wide margin.


Well, it seems like a good part of the analyzer is about some "leaks" or specific behaviors from a language that could give us some tells about what technology is used. I checked Wappalyzer's code (at least the last commit before it went private: https://github.com/dochne/wappalyzer) and PHP gives more tells (https://github.com/dochne/wappalyzer/blob/main/src/technolog...) than Python for example (https://github.com/dochne/wappalyzer/blob/main/src/technolog...).

Some technologies seem to give more tells than others. Which means some technologies could be way more invisible than others. I am not sure we can suppose the known and unknown technologies have the same ratio.

I quickly checked some websites with BuiltWith and Wappalyzer and from my personal totally unscientific and small sample data, they seem to detect more easily PHP than other languages like Python.

Again, I don't know. But I took 5% to be optimistic. It could be 30% or 50%. And then the whole picture changes.

Edit: Funny thing, it even adds PHP to some sites I know (almost for sure) don't use PHP. Like GitHub using Ruby (true) and PHP with Drupal (???).


PHP probably gets used in low-price shared hosting setups or managed hosting more than other languages, where those tells will show up because the developer can't change the Apache or PHP configuration files. And of course the big PHP numbers come from WordPress, which has its own tells. Developers don't necessarily choose WordPress -- the customer chooses it or starts with it and that's what developers have to work with.

More interesting to me than the actual breakdown by language is the reaction from developers when this article or something like it gets published. I look at PHP and every other language I've had to learn and use in my career (a lot of them, I started 40 years ago) as tools to get a job done. I don't get personally invested in languages or tools. I don't identify as a "PHP developer" or "Go developer" or "Javascript developer." I write code and manage systems to make money. What language or tools I decide to use (or more often have to use because someone got to decide already) makes little difference. PHP got popular because it was free (as opposed to ASP and ColdFusion, which were not) and was less inscrutable than Perl. As a long-time C programmer with experience on ASP, ColdFusion, and Perl I had no trouble learning PHP. I likewise had no trouble learning Ruby and Rails, Python, Javascript, and Go. All of those will eventually fade into the world of legacy tools, along with everything else I've learned and used. I don't care, I don't have any part of my personality or ego invested in them. I don't get how other developers get so invested, call themselves "Rust programmers" or whatever and then get hostile and defensive. I guess that's human nature -- I see Tesla owners identifying with their car brand, and I remember the cult around the Saturn cars. I interpret a lot of the apparent insecurity and hostility as inexperience, but it seems to go beyond that into a kind of programmer identity politics.

If PHP dominates public web sites, so what? That has no effect on what tools I choose to learn or use, or how I value my skills, or much of anything that I might care about. I read all the time about how many people use Python, how many jobs are out there for Python programmers, and that makes no difference to me. If I get a client using Python all I care about is solving their business problems and getting paid for my expertise, not what language I have to use.


I agree. I don't think this way either. And I wouldn't be bothered with someone telling me X or Y has the main marketshare on the web. I just don't like when figures are thrown at me and the methodology is questionable. Would be the case for any topic.


> I don't get how other developers get so invested

That's because your investment is means-to-an-end. For others, such as DHH for example, their chosen language is a means of expression and consequently they have a very strong personal investment. The Ruby and Clojure communities, for example, largely consist of developers who have a very strong personal attachment to the language and that's understandable since these languages are much more expressive whereas a language like Java will tend to attract developers who see it merely as a tool and maybe use it because it is a market leader. Languages like Ruby and Clojure demand a broadening of the mind so tend to attract users who are looking for more in a language than the standard fare of Algol-based features.


You could just click a couple of times to find the W3Techs methodology and data gathering process described, here:

https://w3techs.com/faq

No need to speculate about how it works.

tl;dr A lot more than looking at Apache headers or WordPress meta tags.


A lot of news outlets, e-learning and e-commerce websites are running on CMS/Frameworks made with PHP. Just to mention a few of them:

- WordPress - Joomla! - Magento - Moodle - Zend Framework/Cart - Laravel - Symfony - Open E-commerce

If you count all the websites using one of the above items, you will come up with a huge list of websites.

And way too many Academic sites are running on PHP.

The LAMP oder LNMP Combo (nginx instead of Apache httpd) is strong.


The vast majority of websites are probably small business websites which almost all run Wordpress.

That number doesn't surprise me.


People have already questioned the validity of this number. Do a search and you'll find people looking into this and conclude that the number is very unreliable. Whether you agree or not is up to you.

Also I want to point out that almost any time people quote number about PHP's popularity, this is the only number, which is strange -- for metrics like iOS market share you can always find multiple numbers from multiple sources which don't fully agree with each other but are within a certain range. Not for this PHP number. In other words, w3tech's number is not cross validated by any other source. I wouldn't use it to "prove" anything.


"People" questioning the numbers published by multiple outlets over at least a decade? Who? What data do they have to "conclude that the number is very unreliable?"

Whether PHP runs 77% or 69% of public web sites, how does that offend anyone or make them feel insecure? No one is trying to "prove" anything, there's no race to the one ultimate tech stack that requires winners and losers. You can accept the fact that PHP objectively runs a large majority of public web sites without interpreting that as a threat to your choices, your job, your image of yourself as a professional.

Having so much PHP out there may look like a problem, but programmers attaching their ego and identity to languages and tools and frameworks accounts for a lot more wasted time and crappy code than a popular language that has some obvious and well-known flaws.


Sorry to inform you that the original article definitely tries to use this number to prove that php is still relevant.


Idle musing, but this is likely correct. If someone who isn't tech-minded is throwing together a quick blog in WordPress or something chances are they aren't going the extra mile to change the headers or add a reverse proxy just to obfuscate that they're using a stock WordPress install.


That quote has a really big problem.

I run a couple of services that are accesible through an API built in Symfony (PHP), but the data is generated with software built with JS (event driven -> lambda, cloudflare workers), Python (depending on GDAL mostly) and also PHP.


It sounds like you're ultimately just saying that the 77% number doesn't feel right to you.

I agree that it's an interesting challenge to try to determine which language a large number of sites are using for the backend, or at least it would be a challenge in some small but maybe not insignificant percentage of cases. And no doubt whichever way they solved it involved compromises.

But that by itself doesn't give us enough information to draw conclusions about accuracy.


WordPress is one of the few CMSs that sends a header saying it's running WordPress and otherwise makes it very obvious. PHP is so high on the known list because every other back-end you can't tell what it is.


Not true. Identifying a tech stack involves more than looking at a couple of headers. Almost all of them leave enough fingerprints in the headers and page source to identify them. The tools for identifying what runs a web site have a lot more sophistication than you think, in the same way Facebook and Google don't need to see my driver's license to know a whole lot about me.


Agreed, these stats should always include the servers for which they could not determine the language - anything else is just lying with statistics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: