What a joke. A sovereign cloud in the EU has to be owned by an EU company without ties to foreign entities, especially those that hose data or other services for it.
AWS's "foreign cloud" will just be another AI training pot for the NSA.
What a joke. A sovereign cloud in the EU has to be owned by an EU company without ties to foreign entities, especially those that hose data or other services for it.
AWS's "foreign cloud" will just be another AI training pot for the NSA.
Empty of MOPs? (Member of parliament) Is the public allowed to attend such hearings? If so, I doubt it'll be empty.
If the EU commission hears about this, it might trigger another investigation. Hopefully Malus gets whacked over the head repeatedly.
That might also be the case, but that then raises the question of the quality of PRs in order to judge the contribution quality of "anonymous" contributors.
From the post's link:
We hypothesized that pull requests made by women are less likely to be accepted than those made by men. Prior work on gender bias in hiring – that women tend to have resumes less favorably evaluated than men (5) – suggests that this hypothesis may be true.
To evaluate this hypothesis, we looked at the pull status of every pull request submitted by women compared to those submitted by men. We then calculate the merge rate and corresponding confidence interval, using the Clopper-Pearson exact method (15), and find the following:
Open Closed Merged Merge Rate 95% Confidence Interval Women 8,216 21,890 111,011 78.6% [78.45%, 78.87%] Men 150,248 591,785 2,181,517 74.6% [74.56%, 74.67%]
4 percentage point difference overall.
Pull requests can be made by anyone, including both insiders (explicitly authorized owners and collaborators) and outsiders (other GitHub users). If we exclude insiders from our analysis, the women’s acceptance rate (64.4%) continues to be significantly higher than men’s (62.7%) (χ2(df = 2, n = 2, 473, 190) = 492, p < .001)
Emphasis mine. that's 1.7 percentage points.
The final paragraph also omits how the acceptance changes after gender is "revealed" (username, profile image). The graph doesn't help either
For outsiders, we see evidence for gender bias: women’s acceptance rates are 71.8% when they use gender neutral profiles, but drop to 62.5% when their gender is identifiable. There is a similar drop for men, but the effect is not as strong. Women have a higher acceptance rate of pull requests overall (as we reported earlier), but when they’re outsiders and their gender is identifiable, they have a lower acceptance rate than men.
So women drop from 71.8% to 62.5% = 9,3 percentage points, and they say it's more than men, but don't reveal the difference. Only graph has an indication (unless I'm missing a table) and it may be 5 (?) percentage points for men. Which would be about 4 percentage points between both genders.
Figure 5: Pull request acceptance rate by gender and perceived gender, with 95% Clopper-Pearson confidence intervals, for insiders (left) and outsiders (right) |
The conclusion:
Our results suggest that although women on GitHub may be more competent overall, bias against them exists nonetheless.
That's quite exaggerated for <=5 percentage points. Especially for the number of people involved.
Out of 4,037,953 GitHub user profiles with email addresses, we were able to identify 1,426,121 (35.3%) of them as men or women through their public Google+ profiles.
Maybe I missed it, but how many of those were women and how many made PRs?
in a 2013 survey of the more than 2000 open source developers who indicated a gender, only 11.2% were women
Let's compare the PR rate per gender:
Let's say the percentage of women did not increase since 2013, which I'd find difficult to believe, that's 1,269,247 men and 156,873 women. Men made 150,248 + 591,785 + 2,181,517 = 2,923,550 PRs. Women made 8,216 + 21,890 + 111,011 = 141,117 PRs. That's ~2.3 PRs per man and ~0,9 PRs per woman. If the percentage changed and more women became contributors, that would decrease the PRs per woman.
That leads me to ask:
I very much encourage humans to contribute to opensource. So, while this paper says something about the current state of things, it doesn't seem like it's saying much. The differences in pull request acceptance are not very significant (<5 percentage points) to me
It's dead easy. Yet github didn't do it when training copilot and are now sued because of it.
It is also easy to build a database of copyrighted material and check that revealed training data marches it. The copyright licence doesn't necessarily need to be attached. It just makes it easier to spot.
Also, what are you arguing here? That because copyright is easy to ignore, it should be or that it's pointless? Is that the advice you'd give anybody else too? "You know what Disney, everyone ignores copyright, so why not make everything public domain?"
From what I understand LLMs are just large heuristic machines. They gather a lot of statistics on token order and return an answer to that with something that statistically should higher than other options. There's no "understanding". So to answer your question, no, they don't understand the license.
Content is most likely scraped wholesale from websites, possibly run through some clean up to possibly filter out absolute garbage, and fed into an LLM to train it. An LLM can be tricked to reveal its training data (e.g repeat "fruit" forever). It's in those cases where copyright infringement is detected and if action can and has be taken. There are court cases currently in review, the most popular being the one against Github Copilot for infringing on the license of sourcecode it ingested.
Never heard of it, but I use DeepL, which isn't OSS, but at least it's not google and it's better at translating.
They might not even have to. I bet there are bots already having entire discussions by themselves on there.