this post was submitted on 06 Sep 2024
42 points (85.0% liked)

Fediverse

27805 readers
290 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 1 year ago
MODERATORS
 

I wish it was allowed to have persian letter usernames maybe even symbols as usernames it looks really cool and increases the username pool as well.

top 29 comments
sorted by: hot top controversial new old
[–] skullgiver@popplesburger.hilciferous.nl 21 points 1 week ago (1 children)

ActivityPub users need to be identified by some identifier in the URL, and Lemmy chose the user name to be that identifier. As a result, non-Latin usernames become… complicated. Just the right-to-left nature of scripts like Arabic alone would break UI design. Technically you could hex encode usernames and assume them to be UTF-8, but it'd be a massive pain that'll undoubtedly break compatibility with other services.

You can use your display name to freely enter just about any name. Usernames are almost entirely irrelevant to Lemmy as far as I'm aware; I think they only matter in mentions (although that's a choice as well, on the ActivityPub level there's no need to do that). The display name should cover the "it looks really cool" component. As you've maybe seen already, you can include names, flags, and emoji in there as well!

With the current username system, there are more possible usernames than there are grains of sand on earth, per server. I don't think we need a bigger username pool. We may need a better way to tag people, though, but that's also true without different character sets.

[–] SorteKanin@feddit.dk 3 points 1 week ago* (last edited 1 week ago) (3 children)

ActivityPub users need to be identified by some identifier in the URL, and Lemmy chose the user name to be that identifier. As a result, non-Latin usernames become… complicated.

Sorry but this is just false. URIs can easily encode UTF-8 characters and it's perfectly standard to do so via percent-encoding. Example: https://en.wikipedia.org/wiki/😂. Your browser will even automatically convert that 😂 into the appropriate percent-encoding and will even display the emoji in the address bar, even if that is not the "true" URI.

This is, if you ask me, an unnecessary limitation in Lemmy.

[–] Redjard@lemmy.dbzer0.com 8 points 1 week ago

Link is detected without the emoji in my app. You might wanna hardcode the link as https://en.wikipedia.org/wiki/😂
[https://en.wikipedia.org/wiki/😂](https://en.wikipedia.org/wiki/😂)

Link detection is flaky as hell, especially for special characters. They rarely work reliably. URLs themselves don't contain unicode. They use basic ASCII and anything beyond that needs to be encoded in some form. The link you posted isn't a spec-compliant link, it only works because Lemmy apps and browsers are nice and do the conversion to the real URL for you. According to the spec:

When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded. For example, the character A would be represented as "A", the character LATIN CAPITAL LETTER A WITH GRAVE would be represented as "%C3%80", and the character KATAKANA LETTER A would be represented as "%E3%82%A2".

If you use usernames as identifiers (which, again, are optional) like Lemmy does, databases and external entities will use the percentage URLs, not the readable ones. Unicode domains will have their xn-- form stored as well. It's up to apps and browsers to decide those and turn them back into unicode. It's not really relevant what apps and browsers show you when it comes to the technical interoperability of users.

ActivityPub itself has wide support for various languages, including having different names and content for different languages. The username (actually preferredUsername) is transmitted through JSON, which is by definition UTF-8, so most encodings in use today (not that weird Japanese one and that other Asian encoding that's not UTF compatible) will Just Work™ assuming the necessary URL encoding and decoding logic is added in the right places.

I think Lemmy can be patched to accept unicode characters as usernames, as the current limitations in code and in the UI are just choices made during development. I don't think it'll add much, though.

[–] Asudox@lemmy.world 2 points 1 week ago (1 children)

Using ASCII in URLs is simple and is less error prone than "supporting" unicode via percent encoding. It is also just a convention to use ASCII for usernames in many platforms. ASCII is also supported out of the box in major OSes while some unicode characters might not. What about impersonation? And what about people trying to type in the username of someone that uses unicode? It is not logical to use unicode in this case.

[–] SorteKanin@feddit.dk 6 points 1 week ago (1 children)

It is also just a convention to use ASCII for usernames in many platforms.

That's only true for platforms that only caters to the English speaking world. The fediverse should be and is much broader than that.

ASCII is also supported out of the box in major OSes while some unicode characters might not.

What? There is no major OS that does not support Unicode out of the box.

Percent encoding is perfectly fine and users won't even see it.

Also please stop down voting twice with your alt accounts, that's not cool.

[–] sznowicki@lemmy.world 1 points 1 week ago (1 children)

Punycode would work here better I think as it’s plain ASCI with no special characters except a dash if I recall correctly.

[–] SorteKanin@feddit.dk 1 points 1 week ago

Punycode is not solving the same problem. Punycode solves Unicode in domain names. Percent encoding is for Unicode in URL paths. Lemmy only needs to worry about the paths, Punycode should be "supported" out of the box without any special handling

[–] Blaze@feddit.org 14 points 1 week ago (1 children)

I see Arabic used from time to time

[–] Live_Let_Live@lemmy.world 5 points 1 week ago (2 children)
[–] AbouBenAdhem@lemmy.world 10 points 1 week ago (2 children)

This user’s name is displayed in Arabic, although the characters in the URL are Latin.

[–] RobotToaster@mander.xyz 10 points 1 week ago

Looks like his username is in latin characters, but he has an arabic display name.

[–] Live_Let_Live@lemmy.world -1 points 1 week ago* (last edited 1 week ago) (2 children)

Is it possible to make it in other than latin?

[–] originalucifer@moist.catsweat.com 11 points 1 week ago (2 children)

have you ever seen a non-latin char url, ever?

the fedivere is incredibly url/dns dependent. labels/content can be any language (mbin uses weblate to allow for dozens off languages) but the underlying urls that control everything prolly require latin chars

[–] Aatube@kbin.melroy.org 3 points 1 week ago (1 children)

https://ign.中国 ? There's been a standard to encode it as xn-- for a while.

[–] originalucifer@moist.catsweat.com 7 points 1 week ago (1 children)

cool! e. cept it redirected immediately to https://www.ign.com.cn/

[–] Aatube@kbin.melroy.org 4 points 1 week ago

Welp, at least it works. It's called punycode, and some browsers have disabled it by default due to Cyrillic letters posing a security risk. For non-domains, percent-encoding is available.

[–] Live_Let_Live@lemmy.world -2 points 1 week ago (2 children)

Looks good because URL beautifiers catch that, but the actual URL you linked is https://ar.m.wikisource.org/wiki/%D8%AA%D9%81%D8%B3%D9%8A%D8%B1_%D8%A7%D8%A8%D9%86_%D9%83%D8%AB%D9%8A%D8%B1 and relying on URL detection has proven to be very unreliable. For instance, Wikipedia links often lack closing parentheses, or closing parentheses and other punctuation get added to links accidentally.

Hex-encoded URLs are technically valid, but not very readable to humans.

[–] intensely_human@lemm.ee 0 points 1 week ago

Not from an ASCII

[–] Blaze@feddit.org 7 points 1 week ago

Communities display names: https://lemmy.ca/c/maroc

I guess it could work for users display names too

[–] nutomic@lemmy.ml 13 points 1 week ago

I believe there is still an open issue on Github for this, but no one was interested to help implement and test it. So use the search function and contribute!

[–] Asudox@lemmy.world 9 points 1 week ago (1 children)

You won't get non latin usernames anytime soon. But you can change the display name using non latin charactets

[–] turkalino@lemmy.yachts 3 points 1 week ago (2 children)

This thread is news to me. Unicode is Unicode, no? Why restrict to Latin letters?

[–] Asudox@lemmy.world 4 points 1 week ago* (last edited 1 week ago) (1 children)

Because URLs are usually in ASCII. That was a standard. Check RFC 1738 and 3986. Now, you can use percent encoding, but why use that. It just complicates things.

[–] SorteKanin@feddit.dk 4 points 1 week ago* (last edited 1 week ago)

There is a standard way to encode Unicode into URLs, it definitely doesn't have to be ascii. Percent encoding is used all over the place.

EDIT: I don't mind a down vote but double down voting me from your alt @Asudox@lemmy.world is not cool. That's sockpuppetry/vote manipulation.

[–] Tanoh@lemmy.world 4 points 1 week ago

There is also the risk of homograph attacks. The link below is for domain name encoding via IDN, but the same applies to usernames. You could easily impersonate another user by having chars that look similar.

https://en.wikipedia.org/wiki/IDN_homograph_attack

[–] Kolanaki@yiffit.net 3 points 1 week ago* (last edited 1 week ago)

Display Name field. You can use whatever you want. Even emojis. The feature is already in Lemmy; but not every instance has it available. Lemmy.World does use it, though.