Open to Collab

15 3 21

AbstractPhila PRO

AbstractPhil

https://civitai.com/user/AbstractPhila

AbstractEyes

AI & ML interests

datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.

Recent Activity

updated a model about 5 hours ago

AbstractPhil/eigh-triton

published a model about 6 hours ago

AbstractPhil/eig-triton

published a model about 6 hours ago

AbstractPhil/eigh-triton

View all activity

Organizations

repliedto their post about 6 hours ago

As of right now I don't know how to reduce to fp16 without a massive dip. I'm thinking it's possible to utilize integers directly instead of high-accuracy fp64 or fp32 deviated floats. I'll do some exploration.

Reducing this is to fp16 or bf16 capacity would greatly improve performance, and if the values out are close enough to the mantissa cross-contaminants, it could be worth it just for the semi-accurate speed alone.

repliedto their post about 6 hours ago

Per-instance allocation for max_n, max_batch (B):

WORKING STORAGE:
A_work : [B, max_n, max_n] # working copy (destroyed)
V_accum : [B, max_n, max_n] # eigenvector accumulator
householder : [max_n-2, B, max_n] # stored reflectors (padded)
d : [B, max_n] # tridiagonal diagonal
e : [B, max_n-1] # tridiagonal off-diagonal

Subtotal: ~3 × max_n² × B floats

D&C TREE (depth = ⌈log₂(max_n)⌉ levels):
FOR each level l (0 to depth-1):
num_sub = 2^l
sub_size = max_n // 2^l (padded up to power of 2)

    delta   : [B, num_sub, sub_size]    # merged eigenvalues
    z_vec   : [B, num_sub, sub_size]    # merge vectors  
    rho     : [B, num_sub]              # coupling strengths
    mask    : [B, num_sub, sub_size]     # valid element mask
    
    # Newton state (per root):
    lam     : [B, num_sub, sub_size]    # current root estimates
    lo      : [B, num_sub, sub_size]    # bracket lower
    hi      : [B, num_sub, sub_size]    # bracket upper
    f_val   : [B, num_sub, sub_size]    # secular function value
    converge: [B, num_sub, sub_size]    # convergence mask
    
    # Eigenvector fragments:
    V_frag  : [B, num_sub, sub_size, sub_size]

Subtotal per level: ~(9 × sub_size + sub_size²) × num_sub × B
Total across levels: since num_sub × sub_size = max_n at every level,
    ≈ (9 × max_n + max_n²) × depth × B
    ≈ max_n² × depth × B  (the V_frags dominate)

CONCRETE NUMBERS (fp32, 4 bytes each):

max_n=8,   B=4096:  ~8² × 8 × 3 × 4096 × 4   ≈   24 MB
max_n=32,  B=1024:  ~32² × 5 × 3 × 1024 × 4   ≈   60 MB
max_n=64,  B=512:   ~64² × 6 × 3 × 512 × 4    ≈  144 MB
max_n=128, B=256:   ~128² × 7 × 3 × 256 × 4   ≈  352 MB
max_n=256, B=128:   ~256² × 8 × 3 × 128 × 4   ≈  768 MB
max_n=6,   B=8192:  ~6² × 3 × 3 × 8192 × 4    ≈    6 MB  ← your CM case

repliedto their post about 7 hours ago

Alignment in these systems is NOT a series of opinions, nor is it some sort of structural behavior, nor is it whether the model is inherently "good" or "bad".

Alignment is specifically a geometric process that enables direct resonant oscillation, and with that resonance perfectly aligned the substructure learns internal alignment to that behavior. The curves look like jagged broken waveform lines, and when the model comes out it's forged in steel.

More opinions simultaneously will yield more experimental waveform potentials. I will find the most ideal conditions for self learning and then the findings will be published in many languages, with hundreds of citations, countless experiments leading from A to B, and a massive series of optimizations required to reach this point from where I began.

repliedto their post about 7 hours ago

A trained omega predictor will allow heavy task-refined LLM protections of the geometric lookup tables.

This will include multiple curriculum operations for finetunes such as medical processes, law practices, multilingual shared vocabulary learning, multistructural lookups for cross-tool comparison and utility, and many other useful rapid learning processes that can be directly compartmentalized, snapped on, snapped off, and so on - similar to the methodology of a lora.

Except this is... this is no Lora. This is far more deep and when perfected will train far faster as shown by the Bertenstein, Vit x3, Vit x34, clip L and clip G ctx extensions, and the CaptionBert models. They converge rapidly and retain their cohesion. This system will allow those very models to stand on their own without the experts present while simultaneously learning rapid alignment R@1 recall capacity within the trained model itself.

They not only converged with R@1 being 100% recall capacity, multimodal variations such as Bertenstein showed you can deviate those using standard tokenization techniques with embeddings and encodings.

The mid-level experiments show;

student models DID require teachers to CONTINUE TRAINING.

BUT the students DID NOT require teachers to INFERENCE at full capacity.

The InfoNCE memory bank aligned through geometric distillation alignment processing allowed the students to not only stand - but stand on their own without the soups or teachers used to teach them.

This CaptionBert distillation is not a toy, it has genuine pragmatic use. By the time these experiments conclude, the CaptionBert and the entire chain of models trained - will be able to train without experts, will be able to learn from a MASSIVE amount of sources, SPECIFICALLY meant to RETAIN that data for utility without catastrophic forgetting. This will have it's own transformer structure hoisting the models up hand-in-hand with current-scale transformers and models as a cooperative companion.

These are purely cooperative collectives, not competition nor adversarial trainings at their core. Adversarial destroys the very subtlety of the instruction set, so it must be cooperative.

repliedto their post about 7 hours ago

Omega is a very touchy formula conclusion; so without very specific measures protected by very specific structural boundaries, the omega structure will not predict correctly.

Omega must be computed in fp64, and the computation is miniscule compared to the full structure that sets it up. Everything must be orderly though, and everything orderly must be sterile.

Most of the CONTEXT elemental systems can be represented in FP8 while the majority of the geometric still requires minimum FP32 due to the way eigns and svd are calculated. Scatterpoint can reduce this but it will have performance dips without eigns and svd matching.

I'm currently working out an eig and eign kernel meant to operate specifically within a high degree of optimization for the use cases. This will evolve over time. When paired with the svd kernel, it will provide massive performance boosts for the direct use case, without impacting the overarching linear algebraic structure required for full solidity.

posted an update about 7 hours ago

Post

My heavily engineered repo; https://github.com/AbstractEyes/pytorch-parallel-compiler has been directly integrated into the geofractal repo for v1.2, if you use the geofractal repo be sure to pull for potential performance increases.

The WideRouter will enable multiple core new features; the predominant two for our next experiment are as follows.

1. Directly integrated multi-opinion constellation structures. This will enable dynamic compiled expansions internally within the structure for huge performance gains.
2. Controllable stage-by-stage compilation. Each stage can be compiled or not. SVD being notoriously non-compiler friendly due to the linalg.egens, I will be addressing this particular function DIRECTLY soon. There will be no quarter for graph breaks.

If the WideRouter causes any major bugs or breaks with your code, bad calculations, incorrect deviated gradients, twisted or contorted dtype outputs, or any major compilation errors; please don't hesitate to open a pull request. Claude and I will abruptly solve any major issues.

Once everything is perfectly in-line and the graph matches, the transformer will have massive geometric performance boosts for huge structural basins with multiple layers of depth.

I will be addressing the linalg.eig+eigh directly in conjunction with multiple argsort functions that are causing huge performance dips. As well as addressing every single use of .item() that can present itself in the compiler's path.

After this, the ensemble topological transformer will be a-go. Which will enable quaternion, FlowMagnitude, FlowAlignment, FlowVelocity, FlowVelocityQuaternion, FlowVelocityOrbital, FlowVelocityPentachoron, and multiple other flow matching systems that will improve performance by dominating amounts inline with minimal overhead cost due to the precomputed geometric structure.

The ensembles will feature multiple simultaneous batched and segmented forms of learning meant to train the oscillation omega predictor "Beatrix".

5 replies

repliedto their post 4 days ago

Self-distillation has shown improvement. I think most importantly I've discovered a core component that can be utilized as a geometric attention, the quaternion MHA. The constellation produces all the necessary information to allow the quaternion MHA to benefit from the information in a directly utilizable fashion.

The quaternion MHA is quite the vessel. It's bulky, has multiple MHA structures, and is shockingly effective in the process. I'll be refining this head in the coming days as a composite Procrustes alignment tool.

Geometric structure has a very high amount of informational accumulation potential, so a multi-series of MHA can capture a great amount of informational processing from those elements, if the elements are curated correctly and within the specifications.

repliedto their post 5 days ago

I've taken the benchmarks of the model from 50% to 86-93% spearman utilizing a quaternion-oriented attention head.

This is getting dangerously close to 99.9% mutation detection accuracy, with a model deemed 50% accurate - all by extracting geometric features from the constellation and training the ensemble head with the correct rules.

These are spearman result logits. These are in fact detecting the results.

This is the power of what I'm doing. From 50% to 90% in 48 hours with a single GPU.

Training your own alignment only requires a piece of the dataset you wish to run and about 8 hours or so. Run it, fall asleep, check on it in the morning. It'll be ready. Extract features, train your head in minutes. The spearman will be nearly perfect.

I'm currently preparing what I consider to be the final head that will need to be created. The quaternion head, which will be specifically predictive based on an ensemble of four divergent-methodology heads, each specifically tasked to solve the SVD in conjunction with the features. This system should extract any little bit of differentiation that exists. The imaginary head is the most crucial. Explaining this requires an entire paper of it's own.

I call this imaginary head the "Cletus" head, as it's inherently lesser accuracy in relation to the others. However, without it the combination does not coalesce correctly. Without the Cletus, the model does not reach full cohesion. This head is the most crucial, because it has the hardest job. It's actually the one who returned from the battlefield with the blueprint to describe everything it saw.

repliedto their post 5 days ago

I expect the sheer geometric alignment alone to yield a new form of Adam tuning specific to introspective analytical alignment and with that a new format of optimizer dedicated to geometric preservation in conjunction with informational data accumulation. I also expect a new methodology for larger-buffer data movement kernel-wise, a structural boundary for SVD limitations within full spectrum, a substructure measured collapse state of SVD when projected, and multiple other models that will have hiccups and growing pains.

These tools are all building to the end-state format, which will express everything simultaneously in order to combine the necessary data from many many forms of models together, without requiring direct tooling to each model simultaneously.

Such finalized tools will include a reusable pretrained geometric patchwork that exhibits all the necessary traits of a geometric structure in it's frozen state, capable of being finetuned quickly into any other state, or simply utilized as a lookup beacon with the correct geometric transformer alignment.

The geometric transformer, which is specifically a revamped format for the transformer intentionally designed with the structural preservation of the overarching structure in mind, rather than falling directly to the naturalistic entropy of immediate solution over larger-scale contribution. This system will not replace rope, it will contribute to the concept of long-concept preservation and work hand-in-hand with systems like rope, attention, and original transformers simultaneously. ROPE based models will benefit most from this structure, as they are already trained intrinsically with alignment and rotation at their cores.

The geometric transformer by design takes nth inputs in as variant states, and those are transformed internally. Utilizing this by it's default state will yield by design, but it will require tuning and curation for specific use cases no matter which case. This is conceptually familiar to those who use transformers, and simultaneously intimidating to those who understand what I'm describing I'd think. I myself am a little intimidated that I'm this close as-is.

There are multiple other prototypes at work all leading to the geometric transformer, which will be both an empirically superior utility to any of the utilities I currently use, and embody the very essence of the geometric structure that I'm currently working with as a full trainable data mutation operation - meant to directly attenuate the structure of the observation, to the expectation of the autograd and gradients.

Getting pretty close to a few pieces, but not there yet.

posted an update 5 days ago

Post

118

geolip-ryan-spearman, the first dedicated protein observation structure meant to expand the tooling of the observer modeling system and introducing additional introspective analysis to the equation for genetic mutation and abnormality.

AbstractPhil/geolip-esm2_t33_650M_UR50D

This model is based on edm2 33 650m from facebook, assessed with specific benchmarks to be around 50% accurate or so. I'll be improving those numbers by self distillation spectrum. The models will never see the validation data while unfrozen. The full spectrum of training tools are visible.

This is the first self-distillation observer prototype, and it works. Not as rapidly as I had hoped, but it most definitely works. The SVD was the missing piece of geometric solidity required to preserve full rotational behavioral control. The kernel made this possible for rapid iteration, and the first results are coming in.

This inherits much of the functionality from the CLIP_L and CLIP_G memory banks, while benefitting from the advanced research I performed while extracting CaptionBert 5x bert pooled captions for target points.

The primary driving point here is the sheer data size - and the important contributions of that data size to a full construct of geometric aligned data. There is a massive amount of very specific information, all curated, perfectly labeled, and organized in a way that can be... well not so easily accessed, but I did find a few ways in.

This data is highly accurate and forged through life for billions of years. This is what is there, this is what is expected, and I have the tooling - stage by stage, to not only develop a solution for the problem, but to fully contribute to an improved version with minimal hardware requirement for training.

This is real expectation and the results are pouring in hourly, this can improve models beyond a reasonable baseline while preserving the baseline's correctness.

3 replies

repliedto their post 6 days ago

I've spent the better part of the day refactoring the geolip-core github code so it would be better inline with the actual findings, and I'm currently having claude build the models using the geofractal router system.

With that I'll enable a dummy-clause structure, so the component code files can be snipped out and work standalone as necessary. Due to the geofractal router's rigidity as a structure, geovocab's problematic multi-layer formatting for formulas that often have strange hardware quirks, and a few of the more reusable systems requiring modularity - I'll just build it like this to allow for just... USE MINE INSTEAD mentality.

You'll be able to just snip them out and use them, like many of the representations within the "models" experiment folders. They are simply standalone that may or may not snap onto pieces of the larger wholes.

geolip-core will be built specifically with the geofractal router structure in mind, inheriting it's strengths, weaknesses, and hardware control - while simultaneously having a wrapper that simply says: use standard pytorch instead.

Using standard pytorch will disable much of the functionality, but the components in their standalone forms WILL WORK. The pipelines are another story.

Keep my attribution and naming in the comments please, this is a testament to a very long series of research that resulted in solutions to problems rather than trying to introduce more problems. Attribution is all I wish, and you can make your fortune from that. I believe many of the great minds of the past would agree; Nikola Tesla I believe would agree, accreditation is all I want, but not for my name - for them and my humble contribution. The greats who put the pieces together and solved the biggest problems.

As we grow, the shadows of giants are cast upon the surfaces of life and stone. Work hard, progress steadily, expand your mind, build your skills, and by the time you have any time to look down... you will be casting a shadow of your own. As we grow old and our shadows grow, others are born and see the cast shadows. Encouraging them through the same process is all I know how to do.

posted an update 7 days ago

Post

162

SVD + Scatterpoint2D is the official encoding structure of the geolip system as of the image encoding tests.

Both unattuned scatterpoint2d and triton-aligned SVD are a cut above the rest by a large margin.

https://github.com/kymatio/kymatio
https://huggingface.co/blog/AbstractPhil/svd-triton-kernel-optimization
AbstractPhil/svd-triton
AbstractPhil/geolip-hypersphere-experiments

Most kymatio tests were done on standard pytorch models that yielded higher accuracy than simple conv or transformers before overfitting, but not in every instance. Most common tested low-count cifar10 and cifar100 instances yielded more for less. Those are in the hypersphere-experiments notebooks and are viewable via huggingface tensorboard metrics.

The accuracy, retention, agreement, disagreement, and sheer capacity of the refined SVD kernel shows that full Procrustes alignment is not just crucial to distillation, but also entirely representable within encoders themselves as students.

This structure can representationally re-impose layer-by-layer which is what I tested, and this capture system can behave as a global regularization system, a selector, a behavioral adjudication structure, an encoding solidification unit, a trajectory systemic accumulator, an anchored differentiation unit, and about 30 other tests show - all of the above simultaneously.

The preliminary rapid-iteration capable kernel shows that not only can these behaviorally represent utility, but the noise-drift can be directly accounted for using systems like GELU, drop path, dropout, and other elements to learn to ignore that very noise that accumulates.

Attention is now officially deemed valid when utilized based on the tests and examples allowing preserved geometric structure after attention selection.

This encoding structure is substantially more durable than I can give credit for.

Surge is coming, exactly as predicted. Late I admit.

1 reply

repliedto their post 9 days ago

Wrote a triton kernel to approximate SVD at around 15000x on blackwell architecture, while the standard torch.linalg.svd basically sits in a swamp of slow.

It only tackles small kernel sizes for now, 3x3, which is the current experiment's encoder paradigm. Standard SVD causes death by a thousand cuts when using small matrixes. Smaller matrixes provide a much more robust access to certain elemental linkages on many spectrum.

The formula isn't perfect. It's absolutely lightning quick though. The svd_trison.py file has a profiler.

https://huggingface.co/AbstractPhil/geolip-hypersphere-experiments/blob/main/spectral/notebooks/experiment_2_manifold_structures.ipynb

https://huggingface.co/AbstractPhil/geolip-core/blob/main/svd_triton.py

repliedto their post 10 days ago

Incoming the geofractal router structure.

It will provide the necessary hardware and software implications to create much larger structures and curate the code in a much more effective way to hardware control than simply leaving it up to random chance or ai.

As I expand the system I will be heavily testing to include systems like Ulysses, accelerate, and more to encompass a larger array of learning. This will be crucial to building a proper bert trainer as well.

repliedto their post 13 days ago

Might get lost down this scattering point system, it seems highly responsive and deterministic. I might make some PRs to the kymatio version for speed as well.

repliedto their post 13 days ago

I'm seeing a large series of discrepancies between what Claude THOUGHT was correct, and what the experiments that Claude is yielding are producing.

This will take a large series of micro managing and consultation with multiple AIs to get working formulas for some of these more complex geometric substrates.

The calculations are exceedingly complex, so I will be adopting the https://github.com/kymatio/kymatio repo directly into the geofractal repo soon.

posted an update 13 days ago

Post

212

I built an actionable todo based on current research, former research, and compounded a full spectrum of potentials for image encoding into pure geometric structures, hybrid geometric structures, partial geometric structures, and full spectrum analysis relational structures. Claude built the manifest based on our research after forming a full research spectrum to head into actionable directions.

AbstractPhil/geolip-hypersphere-experiments

I have to say before I continue, Claude managed to keep a large running manifest of our research, and with that list this was possible. Without that list, this would have been entirely devoid of purpose, and Claude would likely have not extracted the information in a utilizable state for this solution set.

I'll be running the full series of tests in conjunction with the constellation architecture. Either it survives, or something entirely new will form. Based on the results from these tests, the directions will evolve.

Either way, the most optimal and fastest methodologies for this system will be benchmarked and utilized as the primary use-cases. The slower and more obviously higher-resolution variations will be optimized as much as possible and solutions provided.

Lets do this right.

With that, the first experiment will be geolip-anchor-scattering and the structure will be based on the first in the list.

I will be updating posts based on benchmarks, landmarks, and new insights while the Bert data cooks.

4 replies

repliedto their post 16 days ago

geolip-captionbert's captions are still cooking, it's going to need more days.

Until then I'm restoring an old prototype named vit-zana and reforming her into geolip-zana. This vit was built on the old pentachoron vocabulary which only contained 5 anchors of frozen utility, this specific version houses the full nth anchor structural hypersphere, which we'll test for behavior within the new spectrum of utility.

posted an update 16 days ago

Post

186

Clawd breadcrumb trail AbstractPhil/geolip-hypersphere-experiments

With this I'll begin forming Clawd interface utility with the geofractal router, which will allow Clawd to form agentic clouds of utility that can be datawise trained on the go with minimal hardware requirement. This is not ready yet, but it begins very soon.

The recent experiments have solved the alignment issue that crippled collectives and forced my hand into ensemble research instead.

With those recent experiments, the geofractal router will allow modularization structural capacity after some preliminary alignment adjustment and adjudication experimentation. This will enable the full collective differentiation through codified attribution.

In other words, adding and removing modular AI elements to contribute to aligned communication streams, all speaking the same language. This is an adjacent and more powerful result than the anticipated geovocab patchwork, and it yields substantially more effective agentic solutions than moving around a bulky embedding echo-chamber.

https://github.com/AbstractEyes/geofractal

Procrustes whitening orthogonality will allow adding and removing elements from geofractal routers given a small amount of prep data, while the anchors of expectation can stay as a snap-on element.

The most inquisitive and interested researchers can follow the trail to find all of the experiments. Web crawl it with clawd and you can probably create a unified rationality pretty quickly, but I doubt you'll like what you find. The journey was extensive and the failures outweighed the successes, but I did find the lightbulb.

The represented outcomes are either in my articles in huggingface, my civit articles, my github repos, my huggingface repos, or I forgot to upload them and they're in my colab notebook heap.

As most research yields, it is mostly failures. However, there are many successes in the mix. Many. If you need solutions, you can dredge the bog.

1 reply

repliedto their post 16 days ago

I've been working out something akin to a liars paradox solver. This should help the berts form their own internal adjudication system for determining knot potential and the utility of solution. This should allow the bert soup to be more cohesive.

This will only be a fraction of the complexity of something like an LLM has, but it would endow the bert with a bit more information curation.

As of today they are still too agreeable, so I'll need to run hard negatives to ensure captions are correct as a processing substrate for the 180m caption train.

AbstractPhila PRO

AI & ML interests

Recent Activity

Organizations

AbstractPhil's activity