moonshadow565's comments

moonshadow565 · 2025-08-06T02:10:43 1754446243

Just because its old doesn't mean it's more portable. If anything it makes me think it's even less portable.

moonshadow565 · 2025-07-06T20:59:43 1751835583

What about encoding it in such way we dont need huge tables to figure the category for each code point?

lifthrasiir · 2025-07-06T21:06:30 1751835990

It means that you are encoding those categories into the code point itself, which is a waste for every single use of the character encoding.

panpog · 2025-07-06T21:33:05 1751837585

It seems plausible that this could be made efficiently doable byte-wise. For example, C3 xx could be made to uppercase to C4 xx. Unicode actually does structure its codespace to make certain properties easier to compute, but those properties are mostly related to legacy encodings, and things are designed with USC2 or UTF32 in mind, not UTF8.

It’s also not clear to me that the code point is a good abstraction in the design of UTF8. Usually, what you want is either the byte or the grapheme cluster.

karteum · 2025-07-06T22:13:34 1751840014

> Usually, what you want is either the byte or the grapheme cluster.

Exactly ! That's what I understood after reading this great post https://tonsky.me/blog/unicode/

"Even in the widest encoding, UTF-32, [some grapheme] will still take three 4-byte units to encode. And it still needs to be treated as a single character. If the analogy helps, we can think of the Unicode itself (without any encodings) as being variable-length."

I tend to think it's the biggest design decision in Unicode (but maybe I just don't fully see the need and use-cases beyond emojis. Of course I read the section saying it's used in actual languages, but the few examples described could have been made with a dedicated 32 bits codepoint...)

panpog · 2025-07-07T02:11:41 1751854301

Can you fit everything into 32 bits? I have no idea, but Hangul and indict scripts seem like they might have a combinatoric explosion of infrequently used characters.

eviks · 2025-07-07T04:22:35 1751862155

But they don't have that explosion if you only encode the combinatoric primitives those characters are made of and then use composing rules?

panpog · 2025-07-07T06:48:27 1751870907

You still get the combinatoric explosion, but you have more bits to work with. Imagine if you could combine any 9 jamo into a single hangul syllable block. (The real combinatorics is more complicated, and I don't know if it's this bad.) Encoding just the 24 jamo and a a control character requires 25 codepoints. Giving each syllable block its own codepoint would require 24^9>2^32 codepoints.

eviks · 2025-07-07T06:52:05 1751871125

> Giving each syllable block its own codepoint

That's the thing - you wouldn't do that! Only a small subset of frequently used combos would get it's own id, the rest would only be composable

duskwuff · 2025-07-06T23:08:14 1751843294

Character case is a locale-dependent mess; trying to represent it in the values of code points (which need to be universal) is a terrible idea.

For example: in English, U+0049 and U+0069 ("I" and "i") are considered an uppercase/lowercase pair. In the Turkish locale, these are considered two separate characters with their own uppercase and lowercase versions: U+0049/U+0130 ("I" / "ı") and U+0131/U+0069 ("İ" / "i").

panpog · 2025-07-07T01:48:21 1751852901

Of course you sometimes need tailoring to a particular language. On the other hand, I don't see how encoding untailered casing would make tailored casing harder.

moonshadow565 · 2025-07-03T00:10:49 1751501449

Holy order

moonshadow565 · 2025-06-16T03:54:45 1750046085

It's not that complex if you remove all the stuff you don't use: https://godbolt.org/z/rM9ejojv4 .

Main things you would need to understand is specialization (think like pattern matching but compile time) and pack expansion (three dots).

moonshadow565 · 2025-02-12T15:16:03 1739373363

> League of Legends runs on a custom game engine developed in 2009.

Developed by Sergey Titov (same engine that powers Big Rigs).

killerteddybear · 2025-02-12T17:18:49 1739380729

Big Rigs: Over the Road Racing?

moonshadow565 · 2025-02-12T17:56:15 1739382975

Yes, angry video game nerd made a very funny video about it. Other game that i know that runs on same engine is WarZ.

moonshadow565 · 2025-01-24T21:18:10 1737753490

P1061 is in C++26 so you can instead do:

  const auto [...I] = std::make_index_sequence<INPUT_COUNT>{};
  ((SetupInput<I>(options, transport_manager, subscriber_queues[I], thread_pool, templated_topic_to_runtime_topic)),...);

yay!

foota · 2025-01-24T22:23:51 1737757431

Oh, that is nice!

a_t48 · 2025-01-25T07:32:50 1737790370

That's better, yeah. I still prefer plain ole for loops, but that's much better.

moonshadow565 · 2025-01-02T18:31:34 1735842694

I don't think you can copyright lists of publicly available information (iirc there was some case with phone numbers before). That being said, they also stole code...

gs17 · 2025-01-02T21:47:31 1735854451

ProCD, Inc. v. Zeidenberg was sort of about this:

> For Zeidenberg's argument, the circuit court assumed that a database collecting the contents of one or more telephone directories was equally a collection of facts that could not be copyrighted. Thus, Zeidenberg's copyright argument was valid.[1] However, this did not lead to a victory for Zeidenberg, because the circuit court held that copyright law does not preempt contract law. Since ProCD had made the investments in its business and its specific SelectPhone product, it could require customers to agree to its terms on how to use the product, including a prohibition on copying the information therein regardless of copyright protections.

https://en.wikipedia.org/wiki/ProCD,_Inc._v._Zeidenberg

maxloh · 2025-01-02T19:18:48 1735845528

Moreover, it doesn't seem like static linking to me.

A similar example would be using a GPLv3 licensed JavaScript library in a website. What it implies to other HTML/JS/CSS code is controversial [0]. The FSF actually believed that they should not be "infected" [1], and the legal implications may need to be tested in court.

[0]: https://opensource.stackexchange.com/q/4360/15873

[1]: https://www.gnu.org/licenses/gpl-faq.en.html#WMS

darthwalsh · 2025-01-02T23:09:31 1735859371

The FSF question is about templates, but the chrome extension in question also seems to have copied nontrivial JS.

I don't think chrome extensions can be modified by the user; there's probably some integrity check. So to be GPL compliant they need to publish source files to rebuild the extension?

RobotToaster · 2025-01-02T19:15:15 1735845315

Depends on the country https://en.wikipedia.org/wiki/Database_right

moonshadow565 · 2025-01-02T19:22:27 1735845747

Thanks for the list! It seems that unfortunately copyright applies to databases in EU.

onli · 2025-01-02T18:34:58 1735842898

Right, or: maybe. Depends on where you are (or maybe better: where they are), and whether data collections fall under copyright or some other protection that is translateable enough for the gpl to apply. But if they really also used code that point is moot.

jillyboel · 2025-01-02T19:26:58 1735846018

https://www.rvo.nl/onderwerpen/octrooien-ofwel-patenten/vorm...

moonshadow565 · 2024-12-17T22:42:14 1734475334

Lookup FICLONERANGE ioctl

DrFrugal · 2024-12-17T22:54:10 1734476050

this is either a very big coincidence, or you are in the datamining discord as well. the original archive i base my project on uses RMAN to store everything :D --- thanks for the hint about the FICLONERANGE ioctl... it seems to be fine grained enough to allow me deduplicate on arbitrary offsets, not just whole blocks. will give it a go.

moonshadow565 · 2024-12-17T23:23:04 1734477784

Btrfs is something i originally wanted to use but other people were not fans of linux so custom ad-hoc tooling (RMAN) it was.

DrFrugal · 2024-12-17T23:30:37 1734478237

i tried FICLONERANGE via a python wrapper btw - it turns out, that i can only clone ranges aligned to block boundaries :(

BTRFS is very neat per se, but documentation and help (most of all in very niche cases like this one here lol) is not that easy to come by. my plan would be to properly process the data set, and then make it available as a BTRFS snapshot... you can export btrfs send as a file as well for storage etc.

if all my tries to use BTRFS fail, i might to write my own tooling and virtual filesytem as well, but optimized for my use case (MPQ files and such). thanks for your input so far.

moonshadow565 · on July 3, 2023

How exactly does this make their search engine better?

cbsmith · on July 3, 2023

You forget that their real business is ads.

moonshadow565 · on Sept 10, 2022

League of Legends is one such game as well.